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Preface 


The data is changing the way society and technology evolves, with the advent of IoT, 
Big Data, ML and AI, a rapid development in technology towards more human- 
centric applications has been envisaged. The finance and insurance sectors are not 
an exception and developments in Fin Tech and insurance-tech are in a phase of 
developing unique offerings. 

It is very important to have a common understanding of the actual conditions 
in the financial and insurance sectors and how the technology can help to advance 
and evolve those conditions in a positive manner. By discussing the principles of 
the modern economy that make the modern financial sector and FinTech the most 
disruptive areas in today's global economy, a better understanding and knowledge 
will be acquired. 

The use of data-driven approaches envisions many opportunities emerging for 
activating new channels of innovation on the local and global scale while at the 
same time catapulting opportunities for more disruptive human-centric services. 
Data-driven human-centric applications are at the same time the result of a shared 
vision from a natural evolution of technology and society. Experts in the financial 
and insurance sectors are looking at a dramatic change in how people think about 
global economy and at the same time the technology is facilitating the instruments 
for new ways of understanding, providing a common vision and identifying impacts 
in finance and insurance. 

The INFINITECH book series is focused on addressing the need for clear infor- 
mation for better understanding of the foundations, principles and technologies for 
experts and non-technical experts that participate in the financial and insurance 
process and the constant need for innovation and new services across banks and 
insurance organizations. 
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Who Should Read This Book? 


Financial & Insurance Regulators 


The unique offering for non-technical experts but that participate in the financial 
regulatory process and of the core service to enable the sharing of innovation and 
new services across banks and insurance without exchanging any customer data. 


General Public & Students 


The power of understanding the future of FinTechs, their services and their ability 
to identify different methodologies indicators from a human perspective. 


Entrepreneurs and SMEs 


The most powerful tools to innovate, increase opportunities and increase the power 
of innovation into small and entrepreneurs to meet its full potential if there is good 
participation across the banking and insurance sector. 


Technical Experts & Software Developers 


The guide for technologies and legacy open and non-open sources as a guidebook 
for including the most recent experiences in Europe towards innovating technology 
for the financial and banking sectors. 


What is Addressed in the Book Series? 


“Concepts and Design Thinking Innovation addressing the Global 
Financial Needs” 


In the first part of the INFINITECH book series we begin by discussing the prin- 
ciples of the modern economy that make the modern financial sector and FinTech 
the most disruptive areas in today’s global economy. INFINITECH envision many 
opportunities emerging for activating new channels of innovation on the local and 
global scale while at the same time catapulting opportunities for more disruptive 
user- centric services. INFINITECH is at the same time the result of a shared vision 
from a representative global group of experts, providing a common vision and iden- 
tifying impacts in the financial and insurance sectors. 


“Methods and Design Principles for Financial Innovation, Explaining the 
Supply Side for Interoperability in Finance- and Insurance-Tech” 


In the second part of the series we review the basic concepts for Fintech referring to 
the diversity in the use of technology to underpin the delivery of financial services. 
The demand and the supply side in the financial sector are demonstrated, and fur- 
ther discussed is why FinTech is the focus of industry nowadays and the meaning 
for waves of digitization. Financial technology (FinTech) and insurance technology 
(InsuranceTech) are rapidly transforming the financial and insurance services indus- 
try. We provide an overview of Reference Architecture (RA) for BigData, IoT and AI 
applications in the financial and insurance sectors (INFINITECH-RA). Moreover, 
this book reviews the concept of innovation and its application in INFINITECH, 
and innovative technologies provided by the project for financial sector practical 
examples. 
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xii What is Addressed in the Book Series? 


“Technical Financial Innovation, Solving the Interoperability 
Problems of Europe” 


The third book begins by providing a definition for FinTech as: The use of tech- 
nology to underpin the delivery of financial services. This book further discusses 
why FinTech is the focus of industry nowadays as the waves of digitization and 
the way financial technology (FinTech) and insurance technology (InsuranceTech) 
are rapidly transforming the financial and insurance services industry. In this 
book technology assets that followed the Reference Architecture (RA) for BigData, 
IoT and AI applications are introduced. Moreover, the series of assets includes 
the domain area where applications from the INFINITECH innovation project 
and the concept of innovation for the financial sector are described. Further, we 
describe INFINITECH Marketplace and its components including details of avail- 
able assets. Next, we provide descriptions of solutions developed in INFINITECH. 


What is Covered in this 
INFINITECH Part | Book? 


“Concepts and Design Thinking Innovation addressing the Global 
Financial Needs” 


The INFINITECH Way Foundation is explained in simple words, introducing the 
design principles and basic metrics for enhancing the state of the art. In this book we 
illustrate how INFINITECH Way can offer many advantages in the on-boarding 
process offering speed and agility when optimizing the design and implementation 
of financial services reference architecture and/or without the problems of dealing 
with complex vendor-lock and/or proprietary infrastructure. 

It is also addressed how INFINITECH Way is very practical in multiple 
areas and how the INFINITECH Way is applied. The core elements of the 
INFINITECH Way is the Reference Architecture (RA) and the different meth- 
ods to assess the success of the implementation. Further, it is shown how the 
INFINITECH Way is applied to validate a Reference Implementation addressing 
Big Data, IoT and Al applications for the financial and insurance sectors. The refer- 
ence implementation will serve as a blueprint for the rapid and cost-effective solu- 
tions development and deployment. The INFINITECH-RA classify and describe 
a set of common functional building blocks that will support advances of Big Data, 
AI and IoT applications. 

In this book, a concise and detailed overview of the INFINITECH Reference 
Architecture is provided. The INFINITECH Data Pack is also explained contain- 
ing the set of files, schemas, and metadata model diagrams (Graphs) that represent 
how the INFINITECH way and the financial data could be organized and struc- 
tured. Furthermore, the core vocabularies of the INFINITECH Data Pack includ- 
ing FIBO, FIGI, LKIE and INFINITECH Core are cited and explained. This 
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xiv What is Covered in this INFINITECH Part | Book? 


book also includes a practical example of how the INFINITECH Core is used and 
describes possible extensions in terms of a global technology named traffic analy- 
sis Hub and its work towards formalizing the vocabularies as the TAHO (Traffic 
Analysis Hub Ontology). Finally, a quick run-down of INFINITECH technolo- 
gies, data, and processes applied in the project are listed. In the section on self- 
assessment, we describe its integration with RA. INFINITECH WAY impact on 
Fintech and Insurance is discussed and its application to pilots is illustrated. 
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Abstract 


In this first part of the INFINITECH book series we begin by discussing the princi- 
ples of the modern economy that make the modern financial sector and FinTech the 
most disruptive areas in today’s global economy. INFINITECH envisions a large 
number of opportunities emerging for activating new channels of innovation on 
the local and global scale while at the same time catapulting opportunities for more 
disruptive user-centric services. INFINITECH is at the same time the result of a 
shared vision from a representative global group of experts, providing a common 
vision and identifying impacts in the financial and insurance sectors. 

The INFINITECH Way Foundations is explained in simple words, introduc- 
ing the design principles and basic metrics for enhancing the state of the art. In this 
book we illustrate how INFINITECH Way Foundations can offer many advantages 
in the on-boarding process offering speed and agility when optimizing the design 
and implementation of financial services reference architecture and/or without the 
problems of dealing with complex vendor-lock and/or proprietary infrastructure. 
It is also illustrated how INFINITECH Way Foundations is very practical in mul- 
tiple areas and how the INFINITECH Way Foundations is applied. The core 
elements of the INFINITECH Way Foundation are the Reference Architecture 
(RA) and the different methods to assess the success of the implementation. Fur- 
ther, it is shown how the INFINITECH Way Foundations are applied to vali- 
date a Reference Implementation addressing BigData, IoT and AI applications for 
the financial and insurance sectors. The reference implementation will serve as a 
blueprint for the rapid and cost-effective solutions development and deployment. 
The INFINITECH-RA classifies and describes a set of common functional build- 
ing blocks that will support advances of BigData, AI and IoT applications. 


xxvi Abstract 


In this book, a concise and detailed overview of the INFINITECH Reference 
Architecture is provided. The INFINITECH Data Pack is also explained contain- 
ing the set of files, schemas, and metadata model diagrams (Graphs) that represent 
how the INFINITECH Way Foundations and the financial data could be organized 
and structured. Furthermore, the core vocabularies of the INFINITECH Data Pack 
including FIBO, FIGI, LKIE and INFINITECH Core are cited and explained. 
This book also includes a practical example of how the INFINITECH Core is 
used and describes possible extensions in terms of a global technology named traffic 
analysis Hub ad its work towards formalizing the vocabularies as the TAHO (Traffic 
Analysis Hub Ontology). Finally, a quick run-down of INFINITECH technolo- 
gies, data, and processes applied in the project are listed. In the section on self- 
assessment, we describe its integration with RA. INFINITECH Way Foundations 
impact on Fintech and Insurance is discussed and its application to pilots is illus- 
trated. 
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The finance sector is among the most data-savvy and data-intensive of the global 
economy. The on-going digital transformation of financial organizations, along 
with their interconnection as part of a global digital finance ecosystem is produc- 
ing petabytes of structured and unstructured data. The latter represent a significant 
opportunity for banks, financial institutions, and financial technology firms (Fin- 
Techs): Leveraging these data financial organizations can significantly improve both 
their business processes and the quality of their decisions. As a prominent example, 
modern banks can exploit customer data to anticipate the behaviors of their cus- 
tomers, and to deliver personalized banking solutions to them. Likewise, data can 
enable new forms of intelligent algorithmic trading and personalized asset manage- 
ment [Soldatos et al., 2022]. 

To harness the benefits of big data, financial organizations need effective ways 
for managing and analyzing large volumes of structured, unstructured, and semi- 
structured data at scale. Furthermore, they need to manage both streaming data 
and data at rest, while at the same time providing the means for the scalable pro- 
cessing by a variety of analytics algorithms. The latter processing may also require 
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support for real-time analytics over the heterogeneous types of data (i.e. struc- 
tured/unstructured, streaming/data-at-rest). Management of heterogeneity is there- 
fore one of the most important concerns for big data management in the financial 
sector. Currently, financial organizations spend significant effort and IT resources 
in unifying and processing different types of data, which typical reside in different 
repositories such as operational databases, analytical data bases, data warehouses, 
data lakes, as well as emerging distributed ledger infrastructures (i.e. blockchains). 
Moreover, several applications require also semantic interoperability across datasets 
that are “siloed” across different systems, while the use of big data in real-life 
banking applications also requires the utilization of pre-processing functions (e.g., 
anonymization), which boost the ever-important regulatory compliance of digital 
finance applications. Despite the evolution of big data technologies, there is still a 
need for novel solutions that can successfully confront the above-listed challenges 
to enable the development, deployment, and operation of big data applications for 
digital finance at scale [Soldatos et al., 2022]. 

Big data management and analytics solutions enable Artificial Intelligence (AI) 
solutions in finance. AI refers to the capability of machines to imitate intelligent 
human behavior and act like humans. In many cases AI systems think like humans 
and reason over complex contexts in order to evaluate and to take optimal deci- 
sions. As such, AI systems support two main processes: (i) A learning process that 
allows them to produce rules about how to process and use the information they 
receive, and (ii) Reasoning that drives their decisions and actions. Several AI systems 
are already deployed or planned to be integrated in applications of the financial ser- 
vices industry. They are usually based on one or a combination of technologies such 
as video processing and visual scene analysis, speech recognition, Natural Language 
Processing (NLP), automated translation, machine learning, deep learning, and 
cognitive search. Typical AI applications in digital finance include robo-advisors, 
Al-based personalized asset management systems, statistical credit underwriting 
and risk assessment applications, automated and intelligent KYC (Know Your Cus- 
tomer) applications, fraud detection and anti-money laundering (AML), personal- 
ized finance applications for citizens and businesses, as well as a variety of front 
office applications such as chatbots. Moreover, there are many interesting AI appli- 
cations in the insurance sector such as automated insurance claims management 
and usage-based insurance i.e., statistical calculation of risk premiums based on 
data about the customers’ behaviors (e.g., lifestyle or driving behavior data). Most 
of these use cases leverage Machine Learning (ML) and Deep Learning (DL) tech- 
niques. However, the above list of AI use cases in finance is non-exhaustive. As more 
data becomes available, the use of AI to improve automation, personalization and 
reduce costs will become more attractive. It is expected that FinTech enterprises 
will produce novel Al-based ideas in the years to come. Nevertheless, in several 
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cases AI deployments have to overcome barriers and limitations of existing big data 
management technologies. In other cases, integration with other emerging tech- 
nologies (e.g., RPA — Robotics Process Automation and blockchain technologies) 
are required. In this context, the presentation of tangible AI deployments in finan- 
cial institutions is interesting in terms of the technologies employed, as well as in 
terms of the integration of AI in the digital transformation strategies of the financial 
organizations [Soldatos eż al., 2022]. 

The regulatory compliance of big data and AI solutions is also a major concern 
of financial institutions. Recent regulatory developments in the finance sector, such 
as the 2nd Payment Services Directive (PSD2) in Europe, open significant innova- 
tion opportunities for the sector, through facilitating the flow of data across finan- 
cial organizations. At the same time, these regulations make compliance processes 
more challenging, while introducing security challenges. Other regulatory devel- 
opments (e.g., MiFIDII and the 4th AML Directive) affect the way certain use 
cases (e.g., fraud detection) are developed. Moreover, all data-intensive use cases 
should comply with data protection regulations such as the General Data Privacy 
Regulation (GDPR) of the European Union (EU). At the same time, there are new 
regulations (e.g., eIDAS) which facilitate certain tasks in large scale big data and 
AI use cases, such as the process of digital on-boarding and verification of cus- 
tomers. Overall, regulatory compliance has a two-way interaction with big data 
and AI applications. On the one hand it affects the design and deployment of big 
data and AI applications, while on the other big data and AI can be used to boost 
regulatory compliance. Indeed, AI provides many opportunities for improving reg- 
ulatory compliance in the direction of greater accuracy and cost-effectiveness. For 
instance, the machine learning segment of AI enables collection and processing of 
very large amounts of data relevant to a financial or banking workflow, including 
structured data from conventional banking systems and alternative data such as 
news and social networks. These big data can be analyzed to automate compliance 
against regulatory rules. This is the reason why many RegTech (Regulatory Com- 
pliance) enterprises are Al-based. They leverage AI and big data analytics to audit 
compliance in real time (e.g., by processing streaming data in real time), while at 
the same boosting the accuracy and richness of regulatory reporting. 

Overall, the development, deployment and operation of novel big data and AI 
solutions for modern digital financial organization requires a holistic approach that 
addresses all the above listed issues. Since October 2019, the European project 
INFINITECH (co-funded by the H2020 program of the European Commis- 
sion) is taking such a holistic and integrated approach to designing, deploying and 
demonstrating big data and AI solutions in Digital Finance. The project brings 
together a consortium of over forty organizations, including financial organizations, 
FinTechs, large vendors of AI and big data solutions, innovative high-tech Small 
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Medium Enterprises (SMEs), as well as established research organizations with a 
proven track record of novel research outcomes in AI, big data and blockchain 
technologies and their use in the finance sector. The present book is aimed at pre- 
senting INFINITECHS approach to big data and AI driven innovations in Digital 
Finance, through a collection of scientific and technological development contri- 
butions that addresses most of the earlier identified challenges based on innovative 
solutions. Specifically, the book presents a set of novel big data, AI and blockchain 
technologies for the finance sector, along with their integration in novel solutions. 
Furthermore, it puts emphasis on regulatory compliance issues, including techno- 
logical solutions that boost compliance to some of the most important regulations 
of the sector. 

INFINITECH isa joint effort of Europe's leaders in ICT and Finance/Insurance 
sectors towards providing the technological capabilities, the experimentation facil- 
ities (testbeds & sandboxes) and the business models needed to enable European 
financial organizations, insurance enterprises and FinTech/InsuranceTech innova- 
tors to fully leverage the benefits of BigData, IoT and AI technologies. The latter 
benefits include a shift towards autonomous (i.e. automated and intelligent) pro- 
cesses, that are dynamically adaptable and personalized to end-users’ needs, while 
being compliant to the sectors complex regulatory environment. It will provide: 
[the proposal] 


* Novel Big Data/IoT technologies for seamless management and query- 
ing of all types of data (e.g., OLAP/OLTP, structured/ unstructured/semi- 
structured, data streaming & data at rest), interoperable data analytics, 
blockchain-based data sharing, real-time analytics, as well as libraries of 
advanced AI algorithms. 

* Regulatory tools incorporating various data governance capabilities (e.g., 
anonymization, eIDAS integration) and facilitating compliance to various 
regulations (e.g., PSD2 2nd Payment Services Directive), 4AMLD (4th Anti- 
Money Laundering), MIFiD II). 

* Novel and configurable testbeds & sandboxes, each one offering Open APIs 
and other resources for validating autonomous and personalized solutions, 
including a unique collection of data assets for finance/insurance. 


The project's results will be validated in the scope of 15 high impact pilots pro- 
viding complete coverage of the sectors, including Know Your Customer (KYC), 
customer analytics, personalized portfolio management, credit risk assessment, 
preventive financial crime analysis, fraud anticipation, usage-based insurance, agro- 
insurance and more. INFINITECH will establish a market platform that will pro- 
vide access to the project's solutions, along with a Virtualized Digital Innovation 


INFINITECH and The Global Financial Sector 5 


Hub (VDIH) that will support innovators (FinTech/InsuranceTech) in their Big- 
Data/ AI/IoT endeavors. Based on their strong footprint in the European digital 
finance ecosystem, the partners will engage stakeholders from all EU-28 countries, 
making INFINITECH synonymous to disruptive BigData/AI innovation in the 
target sectors. 


DOE: 10.1561/9781638282297.ch2 


Chapter 2 


INFINITECH Way Foundations 


2.1 INFINITECH Way Foundations 


The INFINITECH Project has introduced the “INFINITECH Way Foundations” 
which offers the best practices in FinTech developments for the insurance and finan- 
cial sectors in Europe. [Innovation Readiness Assessment] 

INFINITECH has introduced the “INFINITECH Way Foundations” which 
offers many advantages in the on-boarding process offering speed and agility when 
optimizing the reference architecture and without the problems of dealing with 
complex, proprietary infrastructure. This on-boarding process outlines adopting 
containers and orchestration and transitioning implemented solutions to a con- 
tainerized approach. INFINITECH Readiness Level (IRL) focuses on evaluating 
the paths taken in the adoption and deployment of those paths and technologies 
for an easy adoption and efficient deployment. [Innovation Readiness Assessment] 

The INFINITECH project has successfully designed and used a holistic frame- 
work called INFINITECH Business Approach or “INFINITECH Way Founda- 
tions” to navigate through the design, development, and deployment of technology 
solutions in the financial services and insurance sectors. [INFINITECH Project 
Review Report] 

The resulting “INFINITECH Way Foundations” provides a strong technical 
framework for developing data driven financial services related applications. The 
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business case for using the technical framework, for example, through a business 
model specification, has not (yet) been validated. By relying on the “INFINITECH 
Way Foundations” as a development framework SMEs, notably Fin Tech and Insur- 
ance Tech companies should be able to advance more effectively and avoid reinvent- 
ing the wheel over and over. [INFINITECH Project Review Report] 

In the other hand INFINITECH is using an innovative on-boarding called 
the “INFINITECH Way Foundations” which also act as a methodology outlining 
containerized solutions and orchestrated Development and Operations (DevOps) 
operations and transitioning methods for implemented. software assets, compo- 
nents, and solutions. For example, the “INFINITECH Way Foundations” is an 
innovative product that provides a strong technical framework for developing data 
driven financial services related applications, but a strong business model has not 
been clearly specified. [Innovation Readiness Assessment] 

INFINITECH Way Foundations applies to: 


e INFINITECH Reference Architecture (RA) 
* INFINITECH Data pack 


o FIBO, Financial Industry Business Ontology 
FIGI Financial Instrument Global Identifier 
LKIE the Legal Knowledge Interchange Format 
INFINITECH Core 

Example-TAHO (Traffic Analysis Hub Ontology) 


i9] 
i9] 
i9] 
i9] 


* INFINITECH data, technologies, and processes 


o Interoperability Data Pack 
o CI/CD 
o KYC/KYB 


* INFINITECH Way Self-assessment 
* INFINITECH Way impact on Fintech and Insurance 


Overall, the INFINITECH-RA will be built on several concepts that have been 
introduced by other reference architecture models, including models developed by 
industrial organizations (e.g., large IT enterprises) and associations (e.g., BDVA), 
as well as models produced by other projects. Moreover, it considers the structuring 
principles of BigData and AI applications in digital finance, as these principles are 
reflected in industrial reference architectures. Nevertheless, the INFINITECH-RA 
aims at introducing a more flexible approach: Instead of providing a rigorous (but 
monolithic) structure of BigData/AI applications, it defines these applications as 
collections of data-driven pipelines. The latter are built based on INFINITECH 
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components. Hence, the INFINITECH-RA provides a number of layered archi- 
tectural concepts and a rich set of digital building blocks, which enable the devel- 
opment of virtually any BigData or AI application in Digital Finance. The project 
will offer increased flexibility in defining data-driven applications in the sector, in 


a way that covers /subsumes most of the rigorous architectures outlined in earlier 
paragraphs. [D2.15] 


2.2 The INFINITECH Way Innovation 


The Innovation is a process that requires to identify and define clear activities from 
both scientific and industrial domains. The Infinitech project also offer innovative 
methods and their means for measuring i.e. Infinitech Innovation Readiness Assess- 
ment or Innovation Radar, which is construct by Customer Evaluation Levels, IPR 
readiness Level, Compliance Maturity Level and Infinitech Readiness Level (IRL). 
In this document we review them and explain its advantages towards catapulting 
innovation in the financial sector. 

The INFINITECH Project (www.infinietch-h2020.eu) has introduced the 
“INFINITECH Way Foundation” which offers the best practices in FinTech devel- 
opments for the insurance and financial sectors in Europe. 

In a complex and competitive European Ecosystem Infinitech emerge offering 
efficient and agile methods and introduced an optimized FinTechs-Friendly refer- 
ence architecture to reduce the problems of dealing with proprietary infrastructures. 
Infinitech Reference Architecture offers many advantages for further development 
using the concept of containers and sandbox development and offers a simple-to- 
follow on-boarding process. A driver of the reference architecture is the innova- 
tion and its potential exploitation which should drive the leadership in Europe and 
beyond borders. 

The Innovation is a process that requires to identify and define clear activities 
from both scientific and industrial domains. The INFINITECH Project also offer 
innovative methods and their means for measuring i.e. Infinitech Innovation Readi- 
ness, Customer Evaluation Levels, IPR readiness Level, Compliance Maturity Level 
and INFINITECH Readiness Level (IRL). In this document we review them and 
explain its advantages towards catapulting innovation. 

In the other hand Infinitech is using an innovative on-boarding called the 
Infinitech Way which also act as a methodology outlining containerized solutions 
and orchestrated DevOps operations and transitioning methods 

The Infinietch project rely in the capabilities of its partner members to produce 
value that is exploitable beyond the lab or Proof of Concept of ideas. The Figure 2.1 
shows the Infinitech Innovation Roadmap where expertise from academia and 
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research converge with industrial products and exploitation plans that uses design 
principles and Reference Architectures into creating a reference implementation for 
the financial and insurance sectors. The participation of stakeholders also comple- 
ment the activity and bring value to transform ideas an innovate. 


2.3 Innovation Radar 


Innovation is always difficult to quantity, particularly when multiple and diverse 
aspects are involved in the innovation process, the Infinitech project is not an excep- 
tion and thus creative ideas and methods needs to be invented to measure Innova- 
tion. 

The INFINITECH Pentagon is an adopted kiviat diagram used as the tool to 
evaluate innovation in the Infinitech project pilots, In the INFINITECH Pen- 
tagon, five dimensions are used to evaluate the different relevant areas of a pilot. 
ie. Customer, Market, Compliance, Technology, resources and Customer. Each 
area is assessed and ranked quantitatively, with integers represented in a concentric 
manner from 1 being the lowest and 5 being the highest. 

The Figure 2.2 below shows a graphical representation (called a Kiviat diagram) 
as example. 

After the evaluation of each dimension, each pilot will have its personal Pen- 
tagon to validate the current level of Innovation. The methodology to evaluate 


INFINITECH INNOVATION READINESS MAPPING 
Customer 
Resources Market 
Technology Compliance 


Figure 2.2. INFINITECH pentagon. 


Innovation Radar Ti 


each dimension is straightforward — a weighted average combining two different 
evaluations: 


© 50% of the final level is the ranking obtained through stakeholders feedback 
and evaluation, following the methodology developed and described in T7.8. 

© 50% of the final level is the ranking obtained through either quantitative or 
qualitative analysis and categorization. They will be thoroughly described in 
the following section. 


2.51 Customer 


It is assessed through three different aspects: the number of Early Adopters, the 
performed Stakeholders workshops and the formalized interests of possible paying 
or internal users. Table 2.1 is the categorization and weight of each component. 


Table 2.1. Customer evaluation levels. 


Categories/Levels 1 2 3 4 5 
Early Adopters 0 1-2 3-5 69 10+ 
Stakeholders Workshops 0 1-3 48 9-14 15+ 


Prospects 0-1 2-4 5-9 10-19 20+ 


Here an example: Pilot#X has 6 Early Adopters, performed 7 Stakeholders Work- 
shops and gathered 18 letters of interest from other stakeholders. The boxes are 
highlighted in red. 

The resulting level of the weighted average is 3,66. Assuming that the level of 
the feedback assessment resulting from the stakeholders questionnaire (developed 
in 17.8) is 4, the resulting overall level would be rounded to 4. 


2.5.2 Market 


The Growth Share Matrix is a fundamental tool to identify the most promising 
businesses and markets and it is based on the principle that higher and sustain- 
able returns are the core of solid market leadership and that it is possible to obtain 


Table 2.2. Pilot 6 evaluation example using customer levels. 
Categories/Levels 1 2 3 4 5 
Early Adopters 0 1-2 325 69 10+ 
Stakeholders Workshops 0 1-3 4-8 9-14 15+ 
Formalized Interests 0-1 2-4 5-9 10-19 20+ 
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them by analyzing company competitiveness and market attractiveness — through 
relative market share and growth rate — thus prioritizing investments in markets 
and business units according to their degree of profitability. Moreover, the matrix 
underlines the gap that has been generated by the enterprise itself with respect to 
its competitors, gaining a significant cost advantage on them. 

Given that relative market share and growth are the two drivers of the matrix, the 
latter illustrates specific combinations of them which are displayed in each quad- 
rant. As it is shown by the picture below, markets and business units can be classified 
as "cash cows’, “stars”, “question marks" or “dogs”: 


e Cash Cows are markets/business units with low growth rate and high rela- 
tive market share which can be leveraged to assign cash to reinvest in more 
profitable opportunities. 

e Stars are characterized by high growth rate and high relative share, being a 
market/business unit with high future potential. 

* Question marks reflect high growth and low share markets/business units 
which can be further exploited or abandoned according to their probability 
of becoming stars. 

© Pets are low share and low growth markets/business units on which companies 
should change their position, divest or liquidate. 


Growth Share Matrix 


The Figure 2.3 depicts the ability of a company to gain a leading position in its 
market before the growth slows represents a key factor. It is in fact most proba- 
ble that a product eventually becomes either cash cows or pets. In the Table 2.3 


MARKET SHARE 


QUESTION MARK 


Figure 2.3. INFINITECH growth share matrix example. 
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Figure 2.4. Strategic positioning model. 


Table 2.3. Pilot 6 evaluation example using customer levels. 
Parameters 1 2 3 4 5 
Growth Share Matrix LL LL LH H; HH 
Strategic Positioning Model B;L B; N;b B;D N;D 


the first parameter represents growth (L;H) and the second one represents market 


share (L;H). 


Strategic Positioning Model 


The Strategic Positioning Model that can be seen in the Figure 2.4 is a tool that 
helps in the analyses the competitive advantage and the differentiation of a specific 
solution in respect to competitors. 

There are two main variables: the target customers (B — broad and N — narrow) 
and the perceived value (L — low-cost and D — differentiation) with four possible 
combinations. INFINITECH is an innovation project and therefore differentiation 
is its objective. The lowest value is given to the so called Cost Leadership (B;L) on 
the upper left. 

The focused cost leadership (N;L) is considered slightly better; whereas a broad 
differentiation (B;D) is an improvement and the focused differentiation (N;D) is 


of the highest value. 
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2.5.5 Compliance 


Nowadays, compliance represents an important tool in the context of competition, 
as it can provide a competitive advantage. The INFINITECH approach addresses 
such a key aspect under two perspectives — Intellectual Property Right (IPR) and 
Compliance Maturity — by leveraging two tools: the IPR Readiness Level and the 
Compliance Maturity Model. 


IPR Readiness Level 


The first one assesses the readiness level of IPR by considering the entire process of 
IPR release, as it is shown by the Table 2.4. 


Table 2.4. IPR Levels. 


Level Description 

1 Hypothesizing on possible IPR (patentable inventions) 

2 Identified specific patentable inventions or other IPR 

3 Detailed description of possible patentable inventions. Initial search of the 


technical field and prior art. 


4 Confirmed novelty and patentability; decided on alternative IP protection 
if not patenting 


First complete patent application filed, Draft of IPR strategy done. 


Positive response on patent application; initial assessment of freedom to 
operate, patent strategy supporting business 


Patent entry into national phase; other formal IPR registered 


First patent granted, IPR strategy fully implemented, more complete 
assessment of freedom to operate 


9 Patent granted in relevant countries, strong IPR support for business 


Source: https://www.researchgate.net/figure/Intellectual- Property- Readiness- Level- Part-of- TRL-Hase 
nauer-et-al- Managing figá 313063121 


Compliance Maturity Model 


The second one represents the compliance to internal policies and whether the 
company working on the solution follows specific procedures. Each pilot should 
follow specific procedures, have a quality plan and a risk management procedure in 
place. The guidelines provided by the INFINITECH project can be found in D1.2 
and future versions. In order to assess the compliance value, the selected model is the 
Compliance Maturity Model represented and explained below. It is an adaptation 
of the widely used CMMI (Capabilities Maturity Model Integration). 
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Table 2.5. Compliance maturity model. 


COMPLIANCE MATURITY MODEL 
CMM! — LC 


f Optimizing certainty of outcomes, reduction of risk 
5 OPTIMIZING EFFECTIVE Optimizing for Effectiveness usé cras erg 


Outcomes are being tracked, feed-forward processes 
&re introduced to improve effectiveness, focus on 


4 pec 0G Outcomes measured and controlled achieving outcomes end advancing capabilties. 
Continuous improvement at the program level. 
Standards provide guidance, feed-back processes are 


DERNED SYSTEM Proactive rather than reactive reaa M" ARON GOEN oe i 
performance and eliminating variation. Continuous 


improvement at the system level. 


Processes are planned, performed, measured and 
e MANAGED PROCESS Processes measured and controlled controlled. Continuous improvement at the process 


level. 


E Work gets completed but often delayed, or over 
4 INTO INITIAL Unpredictable and reactive budget, and with unpredictable output or outcomes. 


Source: https://www.leancompliance.ca/post/capabilities- maturity-model-for-compliance 


Table 2.6. IPR and CMM Levels. 


Parameters 1 2 3 4 5 
IPR Readiness Level 1 2-3 4-5 6-7 8-9 


CM Model 1 2 3 4 5 


2.5.4 Technology 


Technology represents a key dimension for the Infinitech Innovation Readiness 
Mapping. Such areas of interest can be split up into two sub-categories: technology 
readiness level and infinitech readiness level. 


TRL Readiness Level 


The Technology Readiness Level indicates the technology maturity phase of a 
technology according to the EU adoption of the NASA TRL developed in the 
1970 decade, spanning from basic-principle observations to the actual system 
proven in an operational environment. The use of TRLs emerge as a guide that 
enables different background technologist and general public to have a consis- 
tent, uniform discussions of technical maturity across different types of technol- 
ogy. TRLs are measured by unity scale of 1 to 9 with 1 being the lowest level and 
9 being the most mature technology. The US Department of Defense has used 
the scale for procurement since the early 2000s. By 2008 the scale was also in 
use at the European Space Agency (ESA) and in 2010 the European Commis- 
sion advised EU-funded research and innovation projects to adopt the scale. In 
2013, the TRL scale was further canonized by the ISO 16290:2013 standard. TRLs 
were consequently used in 2014 in the EU Horizon 2020 program. The TRL is 
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determined during a Technology Readiness Assessment (TRA) that examines pro- 
gram concepts, technology requirements, and demonstrated technology capabili- 
ties. The second one assesses the actual level of development of a pilot in respect to 
the Infinitech Way (TBD — we'll have the final version of it in the coming days) 


INFINITECH Readiness Level 


The INFINITECH Readiness Level (IRL) is a self-assessment method created to 
evaluate and provide guidance during the adoption process of generic or specific 
technology assets and/or technology conditions related to the deployment phase 
of a pilot project. IRLs main objective is to act as a tool for project pilot lead- 
ers to help them and guide them in their process to identify and select ready-to- 
go technologies, in particular IRL also is useful in the on-boarding process of the 
INFINITECH Way Foundation where technologies that have been developed in 
advance can be adopted at any stage duration the execution of the project pilots. 

IRL is applicable to any program (i.e. INFINITECH Pilot) at any stage that 
operates or want to make use of the technology assets that are developed or imple- 
mented for the financial and insurance sectors, the IRL defines 5 levels scored 1 to 5 
where 1 is the lowest level and 5 is the highest level. IRL can be used directly in the 
INFINITECH Pentagon (i.e. kiviat diagram) as it can be mapped directly within 
the innovation process/radar facilitating in this way the path towards achieve high 
levels of Innovation. 


On-Boarding Process — the development of ad-hoc technologies is an expensive 
process and cumbersome if there is not adequate allocated resources (i.e techno- 
logical and personnel), the re-use of other technologies can facilitate the develop- 
ment and deployment of better applications improving the way teams develop and 
deliver rapidly and efficiently innovative applications in order to achieve a compet- 
itive advantage and operational efficiency. 

INFINITECH project has introduced the “INFINITECH Way Foundation” 
which offers many advantages in the on-boarding process offering speed and agility 
when optimizing the Reference Architecture and without the problems of deal- 
ing with complex, proprietary infrastructure. This on-boarding process outlines 
adopting containers and orchestration and transitioning implemented solutions 
to a containerized approach. IRL focus on evaluating the paths is taken in the 
adoption and deployment of those paths and technologies for an easy adoption 
and efficient deployment. 


Self-Assessment — IRL provides to any project (INFINITECH Pilot) the capacity 
to self-assess the level of adoption following identified technological characteristics 
in different technology domain areas i.e. Data Modeling and Data Interoperability, 
Infrastructure Deployment and Services Platform Adoption, Information Manage- 
ment and Analytics and Intelligent Applications. The technology domain areas are 
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defined by following the requirements from ICT experts into a full-stack imple- 


mentation and the IRL method allows the self-assessment of the evolution of the 


pilot project during the phase of adoption or onboarding process. IRL also adds the 


temporal dimension to an onboarding process which helps to not only self-assess 


the way the project is being implemented but also the validity of the process for 


following that path. 


IRL Level 1 — 


IRL Level 2 — 


IRL Level 3 — 


IRL Level 4 — 


IRL Level 5 — 


Look after the identification and definition of basic conditions 
to start a pilot with as much as possible re-use of INFINITECH 
baseline technologies. As a first step other technologies outside 
INFINITECH Ecosystem may be used but with a clear design 
perspective for on-boarding Infinitech assets in next iteration. 
Ready to go technologies for a prototype building in Client- 
Server model, a series of demonstrators can be available to 
test/proof the deployment of basic functionalities, at this level 
integration is not required but is recommended to have data 
model integration and common access control tools. 

First Deployed Demonstrator with all functionalities in Cloud- 
based deployment, the use of INFINITECH Way Foundation 
and Reference Architecture is the core of this level, the use of 
demonstrators is essential to explain high level use cases and also 
the use of Data sets following the INFINITECH Data model 
or other best practices/standards in the particular domain of the 
pilot demonstration. 

Cross Domain services deployed with DevOps ready for easy 
deployment. Data is required to be shared across different 
domains, it can be tested with simple applications or inte- 
grated in the cross-domain query system. At this level all the 
components are integrated following DevOps Techniques and 
the orchestration methods described in the INFINITECH Way 
Foundation. 

Implemented Proof of Concept (PoC) with interoperable cross- 
domain services, DevOps and Sandboxes deployments are fea- 
tured to facilitate the inclusion of the assets and solutions in the 
INFINITECH marketplace. 


Table 2.7 summarises the IRL levels providing short descriptions on what is 


observed and considered for evaluation on each IRL level and its correspondence 


with technology domains described as parameters and as part of the self-assessment 


evaluation points. 
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Table 2.7. Infinitech readiness level (IRL) self-assessment chart. 
Parameters Level 1 Level 2 Level 3 Level 4 Level 5 
Data Vocabulary Taxonomy Data Model Validated Data Graph 
Modeling Identified Ready Logic & Data Schema Ready 
Physical 
Data & Inter- Data Set Data Storage Query Data Cross-Domain Data Sharing/ 
operability Sample Ready Deployed "Tests Query Exchange 
Performed 
Trustworthiness Data Access Identity Platform Self-Sovereign 
Security & Protection Control Tools Management Access Control identity 
Privacy Methods 
Infrastructure Local Host Client-Server Cloud Docker Ready Scale Up 
Mode Environment Tested 
Services Communication Management Continuous Services using Infinitech 
Platform Services using Services Monitoring Infinitech Orchestration 
Service APIs Tools Orchestration and DevOps 
Methods Compliance 
Applications Use Cases Prototype Demonstrator OnlineServices Marketplace 
with videos Ready 


Table 2.8. TRL and IRL levels mapped with 
innovation pentagon. 


Parameters 1 2 3 4 5 
TRL 1 23 45 67 8-9 
IRL 1 2 3 4 5 


Table 2.8 shows the TRL and IRL mapping with the INFINITECH Inno- 
vation Readiness Mapping using the proposed kiviat diagram in Section 3, it is 
observed how IRL is mapped 1:1 into the innovation assessment radar and intrin- 
sically the evolution of the technology adoption of the pilot in the listed/selected 
domain areas. Note the listed areas are validated with ICT experts and technical 
pilot managers in the INFINITECH Project but can also be extended and ade- 
quate to particular/specific pilot needs. 


2.5.5 Resources 


Resources are the fuel that drives the business and can be represented by three key 
variables: financials; staffing and skills; and organization readiness. The first param- 
eter can be quantitatively assessed by the ROI indicator (the expected or foreseen 
ROI, in this case) (Table 2.9), further understanding what could be the return of 
the company on the investments it made: it is a key tool to verify the quality of the 
decisions made upon managing assets and liabilities. 
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Table 2.9. Resources levels. 


Parameters 1 2 3 4 5 
Expected ROI 0-20% 20-40% 40-60% 60-80% >80% 
Team Readiness Capability Level 1 2 3 4 5 


TeamReadiness Capability Assessment Model 


Much like a report card, a score between 1 and 5 is given for each readiness factor (1 = poor, 5 = excellent) in an effort to quantify 
readiness capability. The scores are averaged, giving a Readiness Capability Level for the team 


Readiness improves when: ‘World-class 
L The knowledge team members need for each Readiness Factor is documented. Level 5 
2. Documented inowledge is easy to use, understand, and access. Team 


3. Knowledge i willingly shared among team members. nm Readiness 
EVE 

situational 

Readiness 


mmamd & Communicate 


* Command & Communicate- 
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Basic 
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* Command & Communicate- 
Clear authority, general dutes 
defined, comuimucate as 


necessary 

- Document & Share- Some 
clanty, some text procedures, 
some sharing 

“Tools & Techniques - Some 
tools anges, sonne + External Resources — 
still nee 1 i 

Thee es 
Available & capable to-various 
egress; some gaps 

- External Resources — 
Prefered list of suppliers, 
contart as needed 

* Train & Drill - Occasional Readiness di ds on the „ People, culture, and 
for emergencies 2e Satan cp eetiy ao agiructure in place » 

+ Risks Mostly focused on shape in handle current deals tunes 


trig 
tivitie quirldy adapto changing 
managing legal requirements acies xonditions and missions - 


Readiness Factors. Command & Communicate, Document & Share, Tools & Techniques, Internal Resources, External Resources, Train & Drill, Risks 


Team Readiness Cap: Cop ability Assay Modal dhy Copyright® 2009 by TeamReadiness, Inc. — OE 
Basset i denne bite SED at deseri qe ww. TeamReadiness com TEA INESS 


Capturing Knawlegge ia Buih Stiang Teams 


Figure 2.5. Team readiness capability assessment model. 


The second and the third are qualitative variables that provide important infor- 
mation on the overall know-how of the company which can be evaluated both 
in terms of the set of skills the company staff has and whether the company 
implements and adheres to schedules and detailed documentations, also smoothing 
internal processes. The self-assessment tool selected for such purpose is the “Team 
Readiness Capability Assessment Model” (see picture 2.5 above), which quantifies 
the readiness capability level of team category groups. 

There are five levels of Team Readiness which are related to the analysis of the 
following variables: command and communication, document and share, tools and 
techniques, internal resources, external resources, training and drilling and metrics. 
The correspondent levels are the following: 


— Lv.l: Not prepared 
— Lv2: Ready to react 
— Lv.3: Emergency Readiness 
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— Lv.4: Situational Readiness 
— Lv.5: Team Readiness 


2.4 Certification 


The world bank defines Financial Sector as a group of resilient, transparent and 
smooth-functioning financial systems and capital markets that contribute to finan- 
cial stability, job growth and poverty alleviation. While in the other hand according 
to Investopedia the Financial Sector is a section of the economy made up of firms 
and institutions that provide financial services to commercial and retail customers. 
Any of these two definitions could potentially suit the different stakeholder needs. 
There is no common agreement and thus not common ways to assess and evaluate 
innovation in this complex sector. The only common agreement is that this sec- 
tor comprises a broad range of industries including banks, investment companies, 
insurance companies, and real estate firms and that in the current business condi- 
tions FinTechs are emerging rapidly with very disruptive applications and services 
best practices for business innovation are required. 

In current financial market conditions there is no way to ensure Financial Ser- 
vices are designed with the adequate level of interoperability and that they are 
deployed following best design practices to compete with the very disruptive Fin- 
Tech Markets. However there are good indicators coming from the INFINITECH 
Innovation Readiness, INFINITECH On-boarding Process and INFINITECH 
Readiness Level approaches that may constitute the first step to create the certifica- 
tion program that is required to ensure that best practices are followed, standards 
are used and the created, implemented, deployed and tested technologies within 
the INFINITECH Marketplace are used. 

Figure 2.6 shows the potential certification label that will be provided to 
pilots that demonstrate (following the certification programme) to be part of the 
INFINITECH ecosystem but beyond that to prove that all their technologies fol- 
lows the INFINITECH way and that the provided applications at the plot level are 
fully compliant with innovative levels. 


Figure 2.6. INFINITECH innovation-certification programme. 
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2.5 Infinitech Terminology 


The common vocabularies in the insurance and financial sectors are evolving in 
the same way society and economy, INFINITECH project take an step ahead 
and identify and define terminology that is required for innovation activities, from 
simple concept like assets and case studies to a more elaborate terms like integration 
and marketplace, terms that are emerging as trend activities in software systems. 
Table 2.10 summarises the Infinitech terminology which can help for better under- 


standing at INFINITECH project level but also in other activities in Fin Techs. 


Table 2.10. INFINITECH adopted terminology. 
Concept/Term Definition 
Asset A technology that was implemented or extended in the context of 


Case Studies 


INFINITECH project. 


An Example (Idea, Slides, Diagram, Video, Demo) How an 
INFINITECH Technology within a pilot works. 


Demonstrator/ An implemented Example in how a Pilot potentially could 

Demo implement the Case Studies 

PoC The implemented concept(s) or use cases for a Pilot progressing a 
Demo, usually case studies are the main concept of a pilot and use 
cases are short experiences that can be proven describing the case 
studies as a Demo 

Use Cases An implemented experiences that can be proven be part of a larger 
case Study idea and with the objective to provide insight towards 
integrated PoC. 

Exploitation The potential of a Case Study for being implemented and 
commercialised as part of a INFINITECH PoC implementation. 

Integration The PoC showing in how to use one or varios Infinitech 
technologies (i.e. Integrated Demo), note that Demos that do not 
use Infinitech technologice remains in the category of Demo 

Deployment A Demo or PoC installed/instantiated in an INFINITECH Server 
or at Stakeholder Infrastructure 

Service The networking conditions or software tools for implementing a 
PoC following Case Studies 

Operation A Demo running in Infinitech Server or Stakeholder Infrastructure 

Application A group of software programs performing functions towards the Use 
Cases in the PoC. 

Marketplace Infinitech server or stakeholder Infrastructure showing assets i.e. 


data, platforms, components, pilot results, etc 


DOE: 10.1561/9781638282297.ch3 


Chapter 3 


Reference Architectures Analysis 


3.1 Reference Architecture Analysis 


Reference Architectures are designed for facilitating design and developments of 
concrete technological architectures, mostly in the IT domain, reducing risks with 
proven components, all while improving overall communication within an organi- 
zation. Real drawbacks and benefits of RAs have been analyzed with respect to the 
project’s Pilots. 

RA facilitated development of concrete IT architectures and reduced mainte- 
nance costs. In general, the value of RAs can be summarized in the following points: 


* reduction of development and maintenance costs of systems 
e facilitation of communication between important stakeholders 
* reduction of risks 


Typically, when a system is designed without a RA, an organisation may accumu- 
late technical risks and end up with a complex and non-optimal implementation 
architecture. 

In the industry complex infrastructures for big data systems and high- 
performance computing (HPC) have been developed and proved to sustain 
intensive data processing services (Netflix, Facebook, Twitter, LinkedIn etc.). 
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The architectures and technologies of world class infrastructure have been pub- 
lished and RAs have been designed and proposed. However, very few solutions 
have been published for the Financial and Insurance sectors and this book aims to 
partially cover the gap. 


3.2 Methodology 


The methodology for the development of the INFINITECH-RA is based on the 
following overlapping phases: 


* Phase 1 — Review and Analysis (M2-M7): This phase was focused on the 
analysis of relevant technical and scientific information, ensuring that the 
INFINITECH-RA considers recent developments in BigData architectures 
in general and for the finance sector in particular. It analysed the BDVA 
RA, along with BigData architectures for digital finance applications that 
have been introduced and are widely used in the industry. In this con- 
text, BigData/AI environments and tools have been reviewed as well. Like- 
wise, this phase considered the stakeholders requirements (from T2.1 of the 
project), as well as regulatory compliance requirements and the specifica- 
tions of INFINITECH technologies reflected in D2.5 and D2.7 respectively. 
The overarching goal of this phase was to make sure that the design of the 
INFINITECH-RA aligns with key requirements of BigData applications in 
digital finance, as well as with the evolution of the state of the art in BigData 
in finance. 

* Phase 2 — Architecture Design (M6-M12): This phase produced the ini- 
tial design of the INFINITECH-RA, using the 4 + 1 view methodol- 
ogy [Kruchten95] for specifying software architectures. The selection of 
this proven and well-known methodology was motivated from the fact 
that INFINITECH pilot systems are essentially software systems. Hence, 
a methodology for specifying software architectures was applicable. In-line 
with the 4 + 1 architecture the INFINITECH-RA is described as a col- 
lection of complementary views (i.e. process, development, logical, physi- 
cal) that represent dynamic behaviors and static behavior of the systems, 
as well as relevant implementation aspects. Likewise, 4 + 1 signifies the 
need for confronting the architecture against a number of scenarios like the 
INFINITECH pilots. As part of the second phase of the INFINITECH 
development, a set of main views for the INFINITECH-RA were specified 
and described, as presented in subsequent paragraphs. Note that the phase 
extends beyond the delivery of the first version of the INFINITECH-RA 


24 Reference Architectures Analysis 


Phase | (MZ-MT) - Review & Analysis Phase 2 (M6-M12) - Architecture Design se 3 (MIZ-MZ7| i ng and Updates 


Figure 3.1. High-level overview of the project’s phased approach to INFINITECH-RA 
development. 


(i.e. to M12 of the project), as several of the presented views will be revised 
in the coming months. 

* Phase 3 — Fine Tuning and Updates (M12-M27): The third development 
phase of the INFINITECH-RA started after the submission of D2.13. Phase 
3 aims at revising and fine-tuning the INFINITECH-RA based on its actual 
use in other work packages, including the sandboxes and pilot development 
work packages. It will receive and exploit stakeholders feedback from the 
practical use of the INFINITECH-RA for developing BigData/AI systems 
for the finance sector. The feedback will be exploited towards improving 
aspects of the INFINITECH-RA, but also towards validating the architec- 
tural concepts introduced in practice. This Phase will lead to the final version 
of the INFINITECH-RA as part of deliverable D2.15 i.e. the third and last 
version/release of this deliverable. 


The Figure 3.1 illustrates the three development phases of the INFINITE 
CH. RA and the main activities that they comprise. 


3.3 General Al/BigData Challenges 


5.51 Big Data 


Over the past few years and in the context of big data, new challenges have emerged 
in data analytics, turning the attention to methodologies that raise the abstrac- 
tion level and facilitate the convergence rate of algorithms. To this end, several big 
data frameworks have been developed by researchers and engineers and respective 
algorithms have been compiled, mainly being domainspecific. However, the devel- 
opment of scalable and distributed analytics solutions for extreme-scale analytics in 
the finance/insurance sector remains quite a complicated process given the diversity 
in datasets and data sources. 
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Data sources can be of different type (structural, semi-structural or completely 
unstructural), in various formats and data accessibility options. Due to the above- 
mentioned factors, , modern enterprises rely on a variety of different and hetero- 
geneous data management systems in order to handle with this data diversity: rela- 
tional datastores that are used to store structural data compliant with an entity- 
relational model where the ensurance of transactional semantics and data con- 
sistency is of major concern, document-based data management systems to store 
semi-structural data, keyvalue stores that are considered efficient to store data com- 
ing from IoT sensors or logging information (i.e. when navigating among different 
web pages in the network, or simply logging the details of a finance transaction 
that took place) and even the use of HDFS data lakes is now considered prominent 
and facilitates the analysis of unstructural data that can be available. However, the 
analysis over a superset of the available data management system of an organization 
is not a trivial task. Joining data across datasets is very cost demanding and difficult 
to be implemented efficiently in the application or data processing level. To over- 
come this barrier, various analytical frameworks have been proposed that provide 
polyglot capabilities and abstract this problem from data scientists, making the data 
management process transparent from their perspective. 

Accessing heterogeneous data sources (a concept often addressed by data inte- 
gration systems or multidatabases [Ozsu11], [Tomasic98]) is a problem that has 
been widely studied in the literature and with the recent emerge of cloud database 
and big data processing, it has been evolved towards polystore systems. Early imple- 
mentations [Minpeng1 1], [Ong14], [Simitsis12] relied on a single common model 
that the target datastores had to transform their schema to. A further improved 
presented by the polystore BigDAWG [Duggan15], [Gadepally16] which defines 
islands of information, which makes use of a single data model and language. Nowa- 
days, Spark SQL [Armbrust15] exploits its advantage for massive parallel process- 
ing over a federation of different and heterogeneous stores. It defines the notion of 
dataframe and provides different connectors over a variety of supported datastores. 
By providing a single interface and a common query language, it pushes down 
the query execution on the target databases, when this is possible, and retrieves 
data into those dataframes that are being used for further processing. That requires 
however the retrieval of the data in the data analysis layer, which can be memory 
greedy. Similarly, Presto allows for massive parallel processing where a coordinator 
orchestrates the query execution on several workers which make use of correspond- 
ing connectors, all of them implementing a common interface in order to hide the 
details of the query execution on the target database. Towards the same direction, 
Apache Drill and Impala maintain the notion of the data connectors that are able to 
retrieve data from external and heterogeneous datastores, and transform it to their 
own intermediate format and model that can be used for data processing in the 
upper layer of the framework. All those approaches provide polyglot capabilities, 


26 Reference Architectures Analysis 


however, they are considered as additional frameworks that require the retrieval of 
the dataset in memory, and exploit their abilities for massive parallelism to be able 
to scale out adequately in order to deal with these requirements. The challenge lies 
on exploring the attributes and statistics of each individual dataset that is stored in 
each store, optimize the query execution and retrieve only the minimum amount of 
data that is needed in each query operator, thus minimizing the need for memory 
consumption and data traffic in the network. 

Besides the analytics level, running analytics over stale data is not ideal. For exam- 
ple, consider the case of a recommender system that aims to predict the behavioural 
patterns of a user, her preferences or dislikes, and provide personalised recommen- 
dations of relevant items. In a real-world scenario, user preferences change fre- 
quently, and new data continuously arrive in a real-time manner. A recommender 
system should, if possible, adapt to these changes as they happen, modifying its 
model to always reflect the current status, while requiring a single pass through the 
data. Furthermore, training new versions of the model in a continuously growing 
dataset is computationally inefficient, leading also to unnecessarily amplified infras- 
tructure costs. Hence, data management approaches are required both to account 
for new datasets and to support these incremental analytics. 

Apart from the scenarios where systems and analytical algorithms should con- 
tinuously take into account the overgrowing datasets in order to update the results 
of their analysis, they will also need to analyze real-time data in order to respond to 
events and create alerts and notifications as they events occur. An example of such 
a scenario is fraud detection, where users must be notified when the fraud trans- 
action occurs. However, dealing with data related to financial transaction raises a 
lot of concerns regarding data consistency and isolation. This is the reason why 
traditional relational database management systems are used, which ensure trans- 
actional (from a database theory perspective) semantics and ensure that a series of 
actions will be executed all or none (Atomicity), the data will be consistent (Con- 
sistency) and that they will not be stored breaking the rules that have been imposed 
by the database administrator (the balance of a bank account cannot be negative), 
parallel actions can be executed concurrently, but the result of their modifications 
will be likewise each action was executed in order (Isolation) and finally, that data 
will be preserved in terms of a system failure and all information can be recovered 
(Durability). Those four attributes consist of the ACID properties that traditional 
database management systems ensure and are of major importance in the finance 
sector. These databases are ideal for handling operational workloads, and they rely 
on the use of locks on data items that are being modified when a transaction access 
a data table. However, in order to perform a fraud detection, the system needs to 
read a variety of data by scanning the whole data table. In other words, to perform 
an analytical operation. 
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Operational and analytical operations are compatible and one blocks the other, 
as the existence of a lock will prevent the latter from being executed. Likewise, a 
long-running analytical operation will pose read locks in the whole data table, thus 
blocking all operational workloads from being executed: the database will not be 
able to serve customer needs as long as the long-running operation takes place. 
Due to this, modern enterprises take a snapshot of the operational database and 
send it to a data warehouse, thus separating those two workloads: The relational 
datastore will serve operational workloads by ensuring the ACID-ity, while the data 
warehouse will serve analytical query processing. The drawback of this approach is 
that detection ofa fraud transaction will take place the following day, as moving data 
from one store to the other is cost-expensive operation and relies on heavy ETLs 
that are being performed during the night. Even exploiting modern approaches 
and apply microbatching in the data movement process (i.e. the modifications of a 
transaction are being sent to a data queue, and a worker periodically gets data from 
the queue and sends them to the analytical warehouse) this allows for near real-time 
analytical processing, and cannot detect events that happen on real-time. Due to 
this, the analytics over operational data is still of great challenge. 


5.5.2 Data Pipelines 


An additional aspect and challenge refers to data management that needs to be 
enhanced in order to both facilitate the needs at extreme-scale (in terms of efficiently 
for data throughput and access) and to address the challenge of various federated 
and distributed systems that hold and corresponding datasets. What is more, one 
needs to consider the changes on the underlying systems (and datasets) and thus the 
required level of dynamicity, raising the challenge for approaches that perform data 
pipelining from heterogeneous distributed systems towards analytics frameworks, 
while being adaptive to the aforementioned changes. 

Tightly-coupled multistore systems trade autonomy for performance, typically 
in a shared-nothing cluster, to integrate structured (RDBMS) and HDFS data. 
Polybase [Minukhin18] is a feature of Microsoft SQL Server Parallel Data Ware- 
house to access HDFS data using SQL. It allows HDFS data to be referenced 
through external PDW tables and joined with native PDW tables using SQL 
queries. Hybrid systems are similar to tightly-coupled systems, e.g. integrating 
HDFS and RDBMS in a sharednothing cluster, except that the HDFS data is 
accessed through a data processing framework like MapReduce. For example, QoX 
[Xu18] uses a dataflow approach for optimizing queries over relational (RDBMS 
and ETL) data and unstructured (HDFS) data, with a black box approach for cost 
modelling. 


28 Reference Architectures Analysis 


Moreover, when dealing with analytics performed on a variety of data sources, it 
is no less important to focus on the data aspects. Data-intensive distributed frame- 
works such as such as Apache Spark and Apache Drill can access multiple data stores 
using a unified API such as SQL. However, applications running on these frame- 
works have direct access to specific data stores and as a result to specific datasets, 
while frameworks such as Apache Ranger offer standardized access authorization 
to data stores, but only for a limited set of supported data stores and with limited 
policies. 

What is required refers to an approach for data management that minimizes 
the data pipelining process and would enable a hybrid management of data, both 
for analytical and for transactional workloads. The latter would enable analytics to 
account for the different datasets made available as they are ingested in the data 
stores (mainly through the respective transactions). 


3.4 Specific Challenges for the Finance Sector 


3.4.1 Siloed Data and Business Operations 


One of the most prominent challenges faced by banks and financial organizations is 
the fragmentation of data across different data sources such as databases, data lakes, 
transactional systems (e.g., ebanking) and OLAP (On Line Analytical Processing) 
systems (e.g., Customer Data Warehouses). This is the reason why financial organi- 
zations are creating BigData architectures that provide the means for consolidating 
diverse data sources. As a prominent example, the Bank of England has recently 
established a “One Bank Data Architecture” based on centralized data manage- 
ment platform . This platform facilitates the BigData Analytics tasks of the bank, 
as it permits analytics over significantly on larger datasets. 

Note that the need for reducing data fragmentation has been also underlined by 
financial institutions following the financial crisis of 2008, where several financial 
organizations had no easy way to perform integrated risk assessments as different 
exposures (e.g., on subprime loans or ETFs (Exchange-Traded Fund)) were siloed 
across different systems. INFINITECH-RA must therefore provide the means for 
reducing data fragmentation and taking advantage of previous siloed data in inte- 
grated BigData Analytics and ML tasks. 


3.4.2 Real Time Performance Requirements 


Real Time Computing is when an IT system must respond to changes according 
to definite time constraints, usually on the order of milliseconds or seconds. In 
the realm of financial and insurance sectors, real time constraints apply where a 
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response must be given to provide services to users or organizations and are in the 
order of seconds or more. Examples range from banking application to cybersecu- 
rity. Most of real-world financial applications are NOT real time as other indus- 
trial automation (plant control) and are usually solved putting more computing 
resources (cpu/gpu power, memory,...) at the problem. However, in the case of 
ML/DL and BigData, algorithms can take significant amount of time and become 
useless in practical cases (responses arrive too late to be used). In those cases, a 
quantitative assessment of the computing time of algorithms is needed to config- 
ure resources to provide acceptable time. 


5.4.5 Mobility 


The digital transformation of financial institutions includes a transition to mobile- 
first banking [Bons12]. This refers to the interaction of customer and financial 
organizations through mobile channels as part of mobile banking. INIFINITECH 
must support mobile channels when visualizing BigData, AI and IoT applications 
for digital finance, but also when collecting and processing user’s/customer’s input. 


3.4.4 Omni-Channel Banking - Multiple Channels Management 


One of the main trends in banking and finance is the transition from conven- 
tional multi-channel banking to omni-channel banking [Komulainen18]. The lat- 
ter refers to seamless and consistent interactions between customers and finan- 
cial organizations across multiple channels. Hence, omnichannel banking/finance 
focuses on integrated customer interactions that comprise multiple transactions, 
rather than individual financial transactions. The INFINITECH-RA should pro- 
vide the means for supporting omni-channel interactions through creating unified 
customer views and managing interactions across different channels based on inte- 
grated/consolidated information about the customer. Note that BigData analytics 
is the cornerstone of omni-channel banking as it enables the creation of unified 
views of the customers and the execution of analytical functions (including ML) 
that track, predict and anticipate customer behaviors. 


3.4.5 Automation 


Data intense applications, also in the banking and insurance sectors, are realized by 
specialized IT and RDBMS administrators. Recently, more and more data scientists 
and business analysts are involved in the development of such applications at a 
significant cost. The INFINITECH-RA should provide the means for orchestrating 
data intense application and management through easily creating workflows and 
data pipelines. 
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5.4.6 Transparency 


During the last couple of years, financial organizations and customers of digital 
finance services raise the issue of transparency in the operation of BigData systems 
as a key prerequisite for the wider adoption and use of BigData analytics systems in 
financial, sector use cases. This is particularly important for use cases involving the 
deployment and use of AI/ML systems that operate as blackboxes and are hardly 
understandable by finance sector stakeholders. Hence, a key requirement for AI/ 
ML financial sector applications is to be able to explain their outcomes. For exam- 
ple, a recent paper by bank of England [Bracke19] illustrates the importance of 
providing explainable and transparent credit risk decisions. 

Overall, INFINITECH shall provide support for transparency in AI/ML work- 
flows based on the use of Explainable Artificial Intelligence (XAI) techniques such 
as LIME (Local Interpretable Model-agnostic Explanations) [Ribeiro16], which 
develops/finds a local model around the prediction that is interpretable. XAI tech- 
niques should be supported as part of the ML techniques of the project, but also as 
an add-on to them as in several cases they will be used to interpret the outcomes of 
other ML/ AI techniques (e.g., some classifiers). 


3.5 Reference Architecture for Big Data/IoT/AI in 
Finance/Insurance 


INFINITECH introduces and validate a Reference Architecture (RA) for BigData, 
IoT and AI applications in the financial and insurance sectors (INFINITECH- 
RA), which will serve as a blueprint for the rapid and cost-effective solutions devel- 
opment and deployment. The INFINITECH-RA will specify a set of building 
blocks that will support advanced BigData, AI and IoT applications. These building 
blocks will support scalable, unified and interoperable data collection from different 
sources and databases (e.g., OLTP-On-Line Transactional Processing, OLAP-On- 
line Analytical Processing, Data Lakes, SQL databases, NoSQL databases, alter- 
native data sources), efficient real-time predictive analytics, multi-channel/Omni- 
channel interactions, data governance functionalities, as well as interoperable data 
sharing and interactions between stakeholders of the financial & insurance value 
chains. INFINITECH-RA will specify the structuring principles that will drive 
the integration of these building blocks in real-life solutions. The INFINITECH- 
RA will serve as a basis for designing, developing and deploying novel BigData, 
AI and IoT solutions that feature “SHARP” (Smart, Holistic, Autonomy, Per- 
sonalized and Regulatory Compliance) characteristics. The project will also pro- 
vide a number of blueprints for developing and deploying solutions aligned to the 
INFINITECH-RA. The blueprints will be based on the elaboration of different 
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designs and deployment configurations that will be tailored to the needs of spe- 
cific solutions. Both the INFINITECH-RA and its relevant blueprints will address 
functionalities that are prioritized as part of the SRIA (Strategic Research and Inno- 
vation Agenda) of the BDVA (BigData Value Association), while considering and 
consolidating concepts from RAs introduced by relevant standardization bodies and 
associations. [the proposal] 

It will provide an overall of the technical requirements and specifications driving 
the project, as well as the detailed specification of the INFINITECH data models, 
technology/regulatory building blocks and the INFINITECH-RA. In here the fol- 
lowing objectives are satisfied: (i) To articulate stakeholders’ requirements regarding 
BigData and Io T-based services with SHARP properties in the financial and insur- 
ance sectors; (ii) To refine and detail the SHARP properties of various services 
in the target sectors; (iii) To analyze the background BigData and IoT platforms 
that will support the pilots and testbeds of the project, and to detail how they will 
be enhanced in order to empower the INFINITECH vision; (iv) To specify the 
security and regulatory compliance requirements of the INFINITECH services, 
while at the same time specifying the relevant solutions to be used in the project; 
(v) To specify the capabilities of the testbeds that will support the development and 
deployment of SHARP services; (vi) To specify the INFINITECH-RA. 


3.6 INFINITECH Reference Architecture 


Reference Architecture (RA) of the INFINITECH project aimed to develop Smart, 
Autonomous and Personalized Services in the European Finance and Insurance Ser- 
vices Ecosystem. The INFINITECH partners have selected a methodology to work 
on the RA, identifying it in the “4+ 1” architectural view model, which is pre- 
sented in the document. The methodology is based on five different views, from 
which the structure of the system can be analyzed (logical view, process view, devel- 
opment view, physical view and scenarios). Moreover, it will be demonstrated that 
all the functionalities of INFINITECH environment are properly covered by this 
model. [D2.13] 

The State-of-the-Art survey underlines that some already existing Reference 
Architectures provide substantial input to INFINITECH, such as the pipelined 
and workflow approach to support the functionalities of the different Pilots and Use 
Cases of the project. Relevant inputs to the task have been considered, in particular 
the input coming from use-cases considered in task “User Stories and Analysis of 
Stakeholders’ Requirements” and a cross reference matrix. [D2.13] 

Finally, a layered and high-level reference view and a detailed logical view of 
the RA are presented. Different layers have been identified (infrastructure, data 
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management and protection, data processing and architecture, analytics, inter- 
face, and presentation/Vvisualization). The layers are mainly a mean of classification 
of the building blocks to form different workflows. The resulting RA provides a 
schema for building solid workflows and ensures full communication and interac- 
tion between all the building blocks, from the data source level (at the infrastructure 
level of the organizations) up to the Data Stores and Processing Analytics to pre- 
sentation and visualization applications. [D2.13] 

High-Performance Computing (HPC) can be distributed at nodes within the 
platform supporting a high degree of scalability. Moreover, RA considers for 
external data sources such as public and private Data Lakes, IoT networks and 
Blockchains. A list of identified building blocks provides the basic functionalities 
of the INFINITECH reference sandbox for a more general class of use cases. Build- 
ing blocks will be identified where existing technologies are available while other 
components will be designed, implemented, and integrated according to different 
assigned tasks. [D2.13] 

The validity of the RA has been proved by a mapping the workflows of the pilots 
of the projects, ultimately proving the conceptual approach of the INFINITECH 
RA. [D2.13] 

The RA constitutes a living solution constantly verified during the continu- 
ous project development and in particular with the different pilots. Moreover, the 
Consortium will promote the RA, along with its methodology and technological 
advancements, during project dissemination as a more general solution applicable 
to a broader set of different use cases beyond the original scope in the Financial and 
Insurance sectors whenever Big Data and AI are to be considered. [D2.15] 

RAs are designed for facilitating design and developments of concrete technolog- 
ical architectures, mostly in the IT domain, reducing risks with proven components, 
all while improving overall communication within an organization. Real drawbacks 
and benefits of RAs have been analyzed with respect to the project’s Pilots. RA facil- 
itated development of concrete IT architectures and reduced maintenance costs. In 
general, the value of RAs can be summarized in the following points: [D2.15] 


* Reduction of development and maintenance costs of systems. 
* Facilitation of communication between important stakeholders. 
* Reduction of risks. 


Typically, when a system is designed without a RA, an organization may accu- 
mulate technical risks and end up with complex and non-optimal implementation 
architecture. [D2.15] 

In the industry, complex infrastructures for big data systems and high perfor- 
mance computing (HPC) have been developed and proved to sustain intensive data 
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processing services (Netflix, Facebook, Twitter, LinkedIn etc.). The architectures 
and technologies of world class infrastructure have been published and RAs have 
been designed and proposed. However, very few solutions have been published for 
the Financial and Insurance sectors. In the following sections, some relevant Refer- 
ence Architectures and Models will be considered along with their relevance to the 
domain sector at which INFINITECH is aimed. [D2.15] 

The purpose of a Reference Architecture is to provide a conceptual and logical 
schema for solutions to a large class of problems. In the INFINITECH project 
the domain is as vast as the Financial and Insurance Sectors where most of the 
applications are data-driven. The class of problems of the project (pilots and use 
cases) and in general the service management of financial institutions and insur- 
ance companies are largely based on data that should be managed in the safest and 
protective way. [D2.13] 

In these domains customers’ enormous data sets must be processed to derive 
information with the purpose to provide better and more competitive services 
respecting the complex and sometimes conflicting regulatory frameworks such as 
privacy, security, interoperability, etc. [D2.13] 

Therefore, a Reference Architecture should have explored in advance the specific 
domains in which the class of problems must find solutions providing a general 
model to which stakeholders (end-users, business owners, designers, data scientists, 
developers, maintainers etc.) can refer for best practices in the specific problem- 
solution space. In information technology, a RA can be used to check solutions to a 
particular problem in that class against the best practices and specific technologies. 
The INFINITECH RA is no exception, and it is the result of the analysis of the 
significant number of use cases in the project’s pilots, their requirements (users’ 
stories) and constraints (regulatory, sector and technological) as well as the state- 
of-the art technologies and similar architectures. [D2.13] 

It is important to state what is the RA in the INFINITECH project: 


* A set of views for the Logical, Process, Development and Physical implemen- 
tations. 

* A set of common scenarios referring to generic use cases. 

* A way to verify the use cases' scenarios and solutions. 

* A way to speak the same language among stakeholders. 

* A way to leverage solutions referring to best practices and building blocks. 

* A way to verify if constraints in requirements, regulatory, technical, and log- 
ical have been addressed properly. 


It is also important to state what the INFINITECH RA is NOT: 


* A ready-to-deploy technological IT framework. 
* A rigid and unmutable set of connecting building blocks. 
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* A set of mandatory rules for development and integration. 
* A manual for implementation and rollouts. 
* A one-size-fits-all recipe for all business cases. 


The INFINITECH Reference Architecture (IRA) (shown in Figure 3.2 and 
explained in Deliverable D2.13), defines a set of layers and allows flexible work- 
flows out of data processing modules, called components that performs specific 
data transformations. A workflow usually consumes one set of data (sources) to 
produce another set of data (destinations). A component can be identified by the 
Input/Output interfaces and the functionality (transformation) provided at the 
edge. In that respect, a data component can be considered as a black box that can be 
replaced by any technological implementation that performs at the edge the same 
transformation. [D2.13] 

The Figure 2.1 depicts the different reference layers of the IRA along with exam- 
ples of data components. Scope of the section is to identify existing and desired data 
components that will support the design, the development and deployment of the 
Pilots’ sandboxes. [D2.13] 

The “black-boxes” follow predefined colors in order to map components in the 
different layers. An INFINITECH solution (sandbox) can be built organizing com- 
ponents in a workflow (sometimes referred to data pipeline) to accomplish a com- 
plex transformation from one set of data sources to another set of data solving 
the business case. Components should therefore be interoperable, with clear inter- 
faces and perform a clear function over the data. The rest of this section is a first 
attempt to identify the existing and to be developed components of the project. 
Further progress in the project will provide the needed details for implementation 
and deployment. 

The first step is the ingestion of data from the relevant data sources, typically 
stored in operational databases. This data ingestion is massive and takes many 
hours, a significant amount of time. The challenge here is to be able to acceler- 
ate data ingestion without sacrificing the querying speed that becomes crucial for 
the next steps. 

The second step is data protection that enables coordinating the invocation of 
components that implement privacy, security, or data protection techniques as well 
as other external services in order to provide a suitable privacy, Security and data 
protection level specified by a secure service provider compliance to regulations. 
Further, the data Protection Orchestrator coordinates several privacy, security and 
data protection components and services to ensure that the successive use of the 
data that have been protected can be processed or stored preserving their privacy 
and Security. It also allows the removal of the protection of the results (if required) 
before delivering them to the end user. 
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The next step is data processing. The development of ML uses cases for digital 
finance entails a series of data processing steps, which can be structured in end- 
to-end data management pipelines. INFINITECH-RA specifies building blocks 
for processing large datasets from their ingestion to the ultimate visualization of 
ML results. This approach is in-line with the operation of most ML platforms 
and tools (e.g., KNIME, Auto-SKLearn, MLBox) which facilitate the development 
and integration of end-to-end pipelines. INFINITECH-RA builds on popular ML 
concepts, as well as on best practices introduced by relevant Reference Architecture 
Models for the finance sector that have been introduced in the industry. 

INFINITECH’s Data Analytics Platforms are products developed to leverage 
the promptness and accuracy of the suspicious events recognition while reducing 
costs and optimizing procedural efforts. 

The interface group contains projects that provide UI functionalities of general 
purpose. This might include some user interfaces, visualization graphs that can be 
reused by pilots to show the results of analytical tools etc. It is important to highlight 
here that this group does not include front-end implementations that are specific 
to the pilots, rather than implementations that are generic and can be reused by 
each pilot. 

The data visualization is a layer that deals with visualization of data. 

In summary The INFINITECH-RA defines layers to logically group compo- 
nents. The identified layers are: [D2.13] 


* Data Sources: At the infrastructure level there are the source of data (database 
management systems, data lakes holding non-structural data, etc). 

* Ingestion: A layer of data management usually associated with data import, 
semantic annotation, and filtering from data sources. 

* Security: A layer for management of the clearance of data for security, 
anonymization, cleaning of data before any further storing or elaboration 

* Management: A layer responsible for the data management aspects, includ- 
ing the persistent storage in the central repository and the data processing 
enabling advanced functionalities such as Hybrid Transactional and Analyti- 
cal Processing (HTAP), polyglot capabilities, etc 

e Analytics: A layer for AI/ML components. 

* Interface: A layer for the definition data to be produced for user interfaces. 

* Cross Cutting: A layer with service components that provides functionalities 
orthogonal to the data flows (e.g. Authentication, Authorization, ...). 

* Data Model: A cross cutting layer for modelling and semantics of data in the 
data flow. 

* Presentation/Visualization: A layer usually associated with the presentation 
applications (e.g., desktop, mobile apps, dashboards) 
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In the end one should note that there are horizontal and vertical concerns. 


* Horizontal concerns cover specific aspects along the data processing chain, 
starting with data collection and ingestion, and extending to data visuali- 
sation. The horizontal concerns do not imply a layered architecture. As an 
example, data visualisation may be applied directly to collected data (the data 
management aspect) without the need for data processing and analytics. 

* Vertical concerns address cross-cutting issues, which may affect all the hori- 
zontal concerns. In addition, vertical concerns may also involve non-technical 
aspects. 
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Chapter 4 


INFINITECH Data Pack 


4.1 INFINITECH Data Pack 


The INFINITECH Data Pack is the set of files, schemas, and metadata model 
diagrams (Graphs) that represent the way the INFINITECH data is organised and 
structured. It contains the metadata in .ttl format and also contains the metadata 
in two different formats, .json-ld and .owl to ensure the Data Pack is accessible to 
different communities. [D4.3] 

The INFINITECH Graph Data Model is the documentation that describes in 
detail all the taxonomies and vocabularies from INFINITECH Core, FIBO (https: 
/ledmcouncil.org), FIGI (https://www.openfigi.com) and LKIF (https://github.c 
om/RinkeHoekstra/lkif-core) domains used in INFINITECH and that describes 
and represent all the relationships between them to build the Data Representation 
of the INFINITECH Graph Data Model. [D4.3] 

The INFINITECH Core ontology compile the common vocabularies that are 
used across different fintech’s and finance domains in the INFINITECH Project. 
The objective of the INFINITECH Core Ontology is to summarize the common 
terms or vocabularies and establish their relationship based on similarities or equiv- 


alences. [D4.3] 


38 


INFINITECH Data Pack 39 


This ontology is the compilation of vocabularies and taxonomies that are com- 
mon to the INFINITECH project and that are used in fintech’s and finance 
domains. The INFINITECH core ontology main purpose is to enable data inter- 
operability and data exchange by identifying the similar vocabularies that are used 
across different domains but because they pertain to different domains, the similar 
terms are not related. INFINITECH core establishes and define the relationships 
between them, facilitating the identification and definition of data exchange and 


data sharing. [D4.3] 


441 FIBO, Financial Industry Business Ontology 


FIBO is a business conceptual ontology standard providing a description of the 
structure and contractual obligations of financial instruments, legal entities, market 
data and financial processes. The primary application of the business conceptual 
ontology is for data harmonization and for the unambiguous sharing of meaning 
across data repositories. This common language (or Rosetta stone) for the financial 
industry supports business process automation and facilitates risk analysis. [https: 
//github.com/edmcouncil/fibo] 

Figure 4.1 depicts the components of FIBO including foundations, business 
entities, securities , and financial business and commerce. 


41.2 FIGI Financial Instrument Global Identifier 


The Financial Instrument Global Identifier is an open standard, unique identifier 
of financial instruments that can be assigned to instruments including common 


Figure 4.1. Financial industry business ontology (FIBO). 
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Figure 4.2. Financial instrument global identifier - FIGI. 


stock, options, derivatives, futures, corporate and government bonds, municipals, 
currencies, and mortgage products. [https://www.openfigi.com/api] 

Figure 4.2 illustrates the component of FIGI including organizations, finan- 
cial instruments with its name and identifier, global identifier, identifier, code, and 
ticker. 


43.5 LKIF, the Legal Knowledge Interchange Format 


Legal Knowledge Interchange Format (LKIF) models legal rules of the kind found 
in legislation and regulations. LKIF is an Upper and Core ontology from the the 
ESTRELLA project. The OWL files are available on GitHub. [https://github.com 
/ RinkeHoekstra/lkif-core] Figure 4.3 shows the components of LKIF including 


norm, action, expression, legal action, and role. 


41.4 INFINITECH Core 


The INFINITECH Core is the baseline of vocabularies and taxonomies used in 
the INFINITECH Project and that summarizes the different concepts that are 
overlapping the different financial and insurance tech. INFINITECH Core indi- 
cates whether the component is reusable and part of the core INFINITECH infras- 
tructure or a pilot specific component. The INFINITECH Core is a top-bottom 
approach in order to identify all the vocabularies that are common cross ontolo- 
gies and define the reference INFINITECH Graph Data Model implementation. 
[D4.4 to D4.6] 

Rather than thinking in extending the reference ontologies and/or identifying 
new things/concepts and relations it is a better approach to create an INFINITECH 
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Action 


Expression 


Figure 4.3. Legal knowledge interchange format - LKIF. 


core ontology that grounds on top of the three reference ontologies. To do that, it 
is necessary to identify without any reasonable doubt all these commonalities while 
connecting them in one main model, the so-called INFINITECH core model. 
Figure 2.5 shows this process and the connections between FIBI, FIGI, and LKIF. 
The picture summarizes the relationship between INFINITECH core component 
based on subclass, property, and alignments. [D4.4. to D4.6] 

Figure 4.4 represents the conceptual alignments between FIBO, LKIF and 
FIGI. All the three ontologies have common concepts between them. For exam- 
ple, organization concept is defined in all the three ontologies. We have identified 
the common concepts and defined relationships between the common concepts. 
In Figure 2.5, you can see that “equivalentClass” relationship is defined between 
the concept representing an organization in FIBO, FIGI and LKIF. In the case of 
document concept, “subClassOf” relationship between document concept from 
FIBO and Document concept from LKIF is defined. 

The bottom part of Figure 2.5 shows the diagram legend. Classes are represented 
as oval shapes, while properties (relationships) between classes are represented using 
solid lines with filled arrowheads on one side of it to show the direction of the 
relationship. The property (relationship) is shown in the rectangular box attached 
to these lines. The “subClassOf” relationship is represented using a dotted line with 
hollow arrowhead on the side of super class. The color of a relationship represents 
the origin of the relationship, i.e. green color relationships represent alignments 
defined by us while black colored relationships come from the respective ontology. 
[D4.1] 
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The INFINITECH core model and data pack define a lingua franca necessary to 
minimize the shortcomings of fragmented data from distinct data silos while har- 
monizing the data organization and knowledge representation within enterprises. 
In particular, the financial sector is covered by using FIBO as reference model while 
insurance sector is covered by LKIF. Furthermore, data pack ontologies and mod- 
els for Internet- of- Things (IoT) derived from FIESTA-IoT or OpenloT, are also 
considered, in order to consider one of the technologies that is driving the digital 
transformation where data are provided by ubiquitous devices. FIESTA-IoT pro- 
vides tools, techniques, processes and best practices enabling IoT testbed/platforms 
operators to interconnect their facilities in an interoperable way based upon cutting 
edge semantics-based solutions. [INFINTECH Technical Report D1.6] 


41.5 Example-TAHO (Traffic Analysis Hub Ontology) 


TAH have developed a unique platform called the Traffic Analysis Hub (TA Hub), 
which ingests vast amounts of media content from diverse sources and builds an 
interactive map of people trafficking routes and hotspots. This platform is currently 
being used by many NGOs, Enforcement Agencies, and Financial Institutions to 
identify where they can best focus efforts in identifying and exposing the criminal 
gangs behind the trafficking. The TA Hub prototype runs in a secure IBM Cloud 
environment that was designed to meet the security needs of these partners, and 
it includes IBM’s Watson AI and other analytical tools that analyze blended data 
to uncover Trafficking hot spots and routes that have not been evident before. The 
TAH tools are also pulling in and making sense of open source data, including thou- 
sands of daily public news feeds, to augment the data contributed by consortium 
partners, and to develop predictive capabilities in the future. [D6.3] 

Pilot #3 leverages innovative INFINITECH technologies/ components in order 
to meet its objectives. The pilot participants collaborate to develop an AI driven 
capability using KYC/KYB methods and semantic technologies over transactional 
data generated by the financial activities that identifies money-related profiles based 
on the data generated. Data profiles then can be associated with human profiles 
based on their financial activity. These profiles will be built into the Watson Al 
engine and will be combined with existing technology and data sourced from the 
TAH human trafficking platform. The results will produce a complete picture of 
people’s profile, people trafficking routes and the corresponding money flows back 
to the criminal organizations. [D6.3] 

Pilot #3 implements KYC/KYB methodologies and uses emerging technolo- 
gies investigating business innovation opportunities, as well as technology inno- 
vation for the banking sector and exploring the ways to reduce constraints that 
limit the development of new sharing data services. The Pilot participants are 
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primarily financial institutions and organizations, investigating the introduction 
of data sharing capability to facilitate improvement of core banking and business 
capability alike the improvement of financial services. [D6.3] 

The pilot proposes that KYC/KYB, a service that is well known and extensively 
used in financial and insurance companies when onboarding new customers, can 
benefit from the ability to share data securely and effectively between inter-bank 
departments, banks and also external services. The Pilot looks after identifying 
data patterns that can be related to unlawful activities and innovate by looking 
at the potential for fighting against human trafficking activities. KYC/KYB data 
sharing is a key characteristic for improving the financial sector, particularly now 
with the advent of FinTech companies where more people are investing using online 
platforms making more disruptive businesses, banking sectors need to enable the 
creation of new products and services. [D6.3] 

Pilot #3 target is to build data profiles then can be associated, by use of 
INFINITECH semantic technologies, to human profiles based on their financial 
activity. These profiles named Red Flag Typologies will be built into an AI engine 
and will be combined with existing technology and data sourced from the Traffik 
Analysus Hub (TAH) human trafficking platform. The expected results are a capa- 
bility to produce a complete financial profile of people that may incur on illegal 
activities, trafficking routes and the corresponding money flows back to the crim- 
inal organizations. The adoption of the INFINITECH Way Foundations by Pilot 
#3 application offers numerous business innovations. [D6.3] 

Current Overall Application Scenario(s) for business innovation focuses 


on: [D6.3] 


1. Money flows detection based on transactional data, this is always an interest- 
ing topic in the area of finance and banking transactions. 

2. The identification of abnormal operations, which in today’s banking systems 
is a trivial activity but the identification and detection of current money data 
flows and its traceability across different banking entities and financial insti- 
tutions and organizations is yet a challenge and particularly when those trans- 
actions are associated to unlawful operations. 

3. Detection of potential human trafficking activities, human trafficking is one 
of the fastest growing crimes in the world today, representing a $150 billion 
industry, and infiltrating supply chains at many levels. 

4. Better understanding in how to fight and disrupt human trafficking crimes 
and put marked and indicators on transactions and/or operations that can 
be identified and thus end to the misery suffered by its victims. 
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Figure 4.5 illustrates pilot #3 designed architecture according to INFINITECH- 
RA. Similar to the workflow defined in INFINITECH-RA section, here we have 
infrastructures, data protection, data management, data processing and architec- 
ture, data analytics, user interface, and visualization. [D2.13 to D2.15] 

This innovation defines the Terminology used in INFINITECH Ecosystem. 
Vocabularies for the INFINITECH Ecosystem. Establish the Basis for the differ- 
ent taxonomies. i.e. TAHO Inclusion in the INFINITECH White Paper on The 
INFINITECH Readiness Level (IRL).[WP4 integrated presentation assets] 

Description: 


* This innovation defines the Terminology used in INFINITECH Ecosystem. 
e Vocabularies for the INFINITECH Ecosystem. 

* Establish the Basis for the different taxonomies. i.e. TAHO. 

* Inclusion in the INFINITECH White Paper on IRL. 


Business Value: 


* Continuous availability of operations. 
* Interoperability and Data Exchange in financial and insurance sectors. 


Target Market: 

* Banking, Financial Services, Insurance, Fin Techs, Insurance Techs. 
Ownership: 

e NUIG and INOVA released under Creative Commons. 


Figure 4.6 shows INFINITECH taxonomy and vocabulary and tools for fintech 
sector. It shows taxonomy which is a way to classify words in hierarchical grouping, 
controlled language which defines which words to use where to use the word and its 
condition of use, terminology which is a system of words that belongs to something 
in common. 


41.6 Terminology Used in INFINITECH 


Definiions used in INFINITECH Ecosystem and Contributions to the 
INFINITECH Readiness Level (IRL). 


* Asset — A technology implemented or extended in the context of 
INFINITECH project. 
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PoC — An Example (Slides, Diagram, Video, Demo) How an INFINITECH 
Technology works. 

DEMOS - An implemented Example in how a Pilot potentially could use 
the PoC. 

Case Studies — The concept(s) or use cases from a Pilot behind a DEMO. 
Exploitation — The potential of a Case Study being implemented as part of a 
INFINITECH PoC. 

Integration — The DEMO showing/using — INFINITECH 
technology/technologies. 

Deployment — A DEMO installed/instantiated in an INFINITECH Server 
or at Stakeholder Infrastructure. 

Operation - A DEMO running in INFINITECH Server or Stakeholder 
Infrastructure. 

Marketplace - INFINITECH Server or Stakeholder Infrastructure showing 
assets. 

Pilot — All the above + End Users. 


DOI: 10.1561/9781638282297.ch5 


INFINITECH Technologies, 
Data and Processes 


5.3 INFINITECH Technologies, Data, and Processes 


This section summarizes technology specifications of the building blocks that 
will be used within the pilots and in particular in the pilots’ sandboxes of the 
INFINITECH Project. It is conceived as a reference resource of information for 
the entire project about the components used and/or developed within the project. 
We detail the BigData/IoT technological building blocks that will be developed 
in the scope of the project, in the areas of data management, semantic interoper- 
ability, cost-effective real-time BigData analytics, elastic cloud storage, integrated 
(declarative) data querying, AI/ML algorithms and more. 

This section presents the available tools and applications owned by the Consor- 
tium’s Partners that constitute the background of the technologies exploited in the 
Project. These BACKGROUND Technologies can be the basis to build up other 
components for INFINITECH, by improving them and increasing the related 
TRL. The technologies are identified after a process/reasoning based on the input 
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provided by the Consortium Partners. For each tool or application, the following 
information are provided: [D2.5-D2.6] 


e Title/Name as a short description of tool/platform. 

* Description in particular the characteristics and/or technology the compo- 
nent is based on. 

* Documentation or detailed references to links/demo environment. 

e ACRONYM of the Company or Partner in the project who has developed 
tool/platform and is owner. 

* TRL Technology Readiness Level current and expected level at the end of the 
INFINITECH project. 

* [deas for enhancements to be done in the course of the project. 

* License schema (e.g. Proprietary, GPL, Apache, MIT, ...). 

* Pilots (usually referred as PO1, P02, etc). 


5.2 Technologies 


The following is a list of the possible technologies that may be useful in the fintech 
and insurance tech sectors, most of them are referred as INFINIETCH technolo- 
gies as they are included in some of the processes in the pilots while others are 
included by its relevance on the sector. This list does not aim to be extensive but 
includes basic descriptions for a general public understanding about all the relevant 
technologies as a summary. 


5.2] Data Ingestion 


This technology is tasked with the capture, homogenization, distribution and stor- 
age of the datasets that support the connected car pilot. It involves the design and 
implementation of the IoT agents that adapt the data available from the Vehicle, as 
well as deploying the necessary modules for data storage and distribution (FIWARE 
Orion, FIWARE Cygnus, etc). As the data sources size increases, the technology 
will have to become more robust by polishing the performance of the deployed 
IoT agents. 


5.2.2 Data Protection Orchestrator (DPO) 


It is an enabler for embedding and automating the assurance of security and privacy 
by design and by default in heterogeneous and complex business flow. It orches- 
trates various privacy and security management functions (such as access control, 
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encryption and anonymization). It will be used as part of the data governance 
Framework of the Project, and towards establishing the regulatory compliance tools 
in the project’s sandboxes. It requires Swagger specification of the components 


(PETs) that will be called via REST by DPO There will be developed the busi- 


ness flows to address the specific communication with the components. 


5.2.5 Digital User Onboarding System (DUOS) 


It is a solution for dealing with virtual identities in a mobile device. It provides 
remote user registration using eID or passport. It is needed to use eIDs issued by 
European National authorities according to the EU eID schemas: eID cards and 
Passports There will be implemented the needed improvements regarding integra- 
tion with end users application (Bank application) that needs user authentication. 


5.2.4 EASIER-AI 


EASIER-AI is an Hybrid (Cloud/Edge) platform that facilitates to develop, mea- 
sure, monitor and deploy your AI models. The platform facilitates the data science 
tasks and it is focused on working on Hybrid Infrastructure and exploiting data 
generated by IoT. The platform synchronizes Cloud and Edge, keeping the Edge 
always up to date to run always the most accurate model. By including this tool in 
the INFINITECH project, we aim to feed it with new dataset sources, resulting in 
the development of new ML models for the platform. 


5.2.5 Driver Profile Classifier 


This technology is aimed at the use of high-quality vehicle data allows insurance 
companies to offer customized products. The application of supervised machine 
learning techniques is proposed to classify drivers’ profiles which generates a cus- 
tomized insurance premium. The resulting model is then deployed with Tensor- 
Flow serving and integrated as part of the cloud platform with a wrapper, achieving 
an accuracy of 85.7%. The inclusion of this model in the project could bring an 
improvement on the data analysis of the datasets, bringing an improvement to the 
model accuracy. 


5.2.6 Distributed Near-real-time HPC Processing and 
Exchange of loT Streaming Data 


Al algorithms’ optimization exploiting GPUs capabilities. Usage of GPU processors 
to enhance AI algorithms performance and reinforce CPU capabilities. 
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5.2.7 Botakis Chatbot Development Network 


A tool for rapid development of chatbots applications, which will be used for the 
development of chatbots features in the INFINITECH pilots (i.e. notably the LIB, 
BOC and NBG led pilots). Enhancements expecting to be achieved for Botakis 
Chatbot Platform, based on INFINITECH pilots (i.e. notably the GFT and NBG 
led pilots) : — Built-in dialogs that utilize and be integrated with existing NLP 
frameworks (open or proprietary) provided by partners or every interested party — 
Powerful dialog system with dialogs that are isolated and composable. — Built-in 
prompts for simple things like Yes/No, strings, numbers, enumerations. 


5.2.8 Crowdpolicy Open (Innovation) Banking Solution 


Crowdpolicy Open (innovation) banking platform is a set of predefined and cus- 
tomizable banking web services and data models integrated with our own API Man- 
ager that supports access control, monitoring and authentication. Our solution puts 
the bank (or any monetary financial institution) in control of the third-party part- 
ner relation. The solution is full PSD2 & GDPR Compliance. Enhancement aim 
through INFINITECH project are : — technology scale-up is to democratize the 
use and exploitation of open banking APIs even for users with no development 
skills, building fintech software development kits. — implement a complete pro- 
grammable framework to integrate different services and apis using protocols by 
providing similar user experience as zapier, “yahoo pipes” and “IFTTT”. The main 
objective at the innovation perspective is to provide a graphical user interface for 
building data and fintech services mashups that aggregate open banking APIs, open 
available data sets and rules and creating Web based apps from various sources, and 
publishing those apps. 


5.2.9 Al-Engine-for-Psychometric-Profiling and 
Personalization 


Al-driven engine to extract four categories of behavioral features, grouped accord- 
ing to the type of spending behavior they capture: (i) overall spending behav- 
ior, (ii) temporal spending behavior, (iii) category-related spending behavior, and 
(iv) customer category profile. The engine uses these features to predict the person- 
ality of customers. 


5.210 Open Source AI/ML Frameworks 


These frameworks facilitate the development of AI/ML based tools, which shall 
be applied to Financial Crime and Fraud, e.g. on so called Instant Loans. Today 
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a number of open source tools for AI/ML development are available. The AUML 
community is progressing these technologies dynamically. This way it provides the 
basis for solution development and facilitate the specific solution of a wide range of 
business problems as in INFINITECH. This way, these open source tools provide 
the foundation for development towards off-the-shelf modules being part of the 
INFINITECH-RA. 


5.2.11 Data Layer - REST API 


A Data Layer to support Security Data Model with REST API based on a not 
relational database (MongoDB). Supports heterogeneous sources. Developed upon 
FLASK-Python3 framework and dockerized to be deployed on Kubernetes infras- 
tructure. This tool aims to complete the wrap with standard I/O. 


5.212 Terrier Information Retrieval Platform 


Search Engine for BigData sets that offers integration with Spark for distributed 
processing. We plan to extend Terrier with a new open source module that combines 
real-time data stream ingestion (via Apache Flink) with distributed database access 
(via LeanXcale) for real-time data indexing and updating from multiple sources, 
within WP3 and WP5. We will also expand Terrier with enhanced Python inte- 
gration, allowing easier use from common data science pipelines, such as those 
involving Pandas. This technology can be used for tasks such as searching/sampling 
financial product portfolios, user profiles or for providing recommendations. 


5.215 Anonymization Tool 


A tool that anonymizes data in order to preserve privacy. It also provides metrics that 
allow to measure the risk of the anonymized data and the impact of the anonymiza- 
tion process on the utility of the data. The tool will be used in pilot #11 and pilot 
#12. The component needs a specific configuration/development for each pilot in 
which it is used. 


5.214 Polyglot Database Management System 


The LXS DBMS is a polystore database that provides access to different and het- 
erogeneous datastores via a common interface. It allows for the data user to submit 
a query, whose scan operators can request data that are stored in external datastores 
and combine their intermediate results with data coming from other sources, either 
LXS internal datastore or others. For instance, a JOIN operator might require to 
JOIN table A (resigned in LXS) and table B (resigned in a MongoDB or a Hadoop 
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DataLake). At this phase, there is a support for a limited target datastores, for Proof- 
of-Concept of the prototype. Moreover, the user has to write queries for the target 
datastores in the specific dialect. What is more, JOIN operations are not efficient, 
as they require all data resigned in an external datastore to be retrieved in the query 
engine level. The enhancements that are planned to be implemented are the fol- 
lowing: (1) Provide support for a variety of different datastores, according to the 
needs of the pilots. (2) introduce a novel SQL-like query language, so that the data 
user can write a query in a seamless way, and let the polystore interprete it to the 
target datastore. (3) improve the query engine in order to take into account scan 
operations (mostly part of the one of the JOIN arguments) that need to retrieve 
date form an external datastore. This will require the query optimizer to explore 
equivalent operation graphs based on the nature of the target polystore, the opera- 
tor to be able to push down operations to the target store, and the query processor 
to set up the corresponding data pipelines during the execution of the query plan. 
It is not clear at this moment, if the corresponding wrappers/connectors for each 
target datastore will be able to retrieve statistics to feed the query optimizer. This 
depends on the target datastores that will be used by the pilots, and their capabilities 
to expose this type of information. 


5.215 HTAP Database for the Financial and Insurance Sector 


An ultra-scalable SQL Database and real-time big data platform that revolution- 
ize the business database management systems by introducing the next generation 
business database that can scale in any of the three Vs of Big Data (Volume, Veloc- 
ity and Variety). In more technical details, it provides an ultra-scalable transactional 
management system that can scale out to 100s of nodes, which is typically a bot- 
tleneck in traditional database systems that provide transactional semantics, while 
on the same time, is full SQL compatible and ensures all ACID transactional prop- 
erties. It additionally exposes an interface for direct access of its key-value storage 
engine, thus providing a dual access without downgrade transactional semantics. It 
offers OLTP and OLAP integration, thus providing support for HTAP that allows 
for analytical queries over operational data, which realizes the concept of real time 
business intelligence. Finally, it enables for the execution of polyglot query process- 
ing across different and heterogeneous data sources. Modules of the database will 
be used for implemented all building blocks of INFINITECH that are related to 
data management. LXS background technology will be enhanced in order to sup- 
port the data management building blocks: Mainly, it will be enhanced in order 
to comply with the requirements for HTAP support, and to be compliant with 
the target data sources that need to be accessed via the polyglot mechanism. More- 
over, it will provide support for real time query processing, enabling queries that 
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combine both streaming data with data at rest. Finally, it will provide the support 
for incremental and parallel query analytics. However, at this phase of the project, 
it is not yet defined the exact technical details regarding these enhancements. 


5.216 Natural Language Processing for Real-time, 
High-accuracy Credit Risk Assessment 


ReportBrain’s NLP functionalities: — Provide real-time structured feeds on risk- 
assessment worthy information sourced from the news — Interlink entities, updates 
and maintains knowledge graphs in real-time with all the interlinking of entities — 
Use “visual” algorithms to collect and analyze the news in real-time in 65 languages 
24/7 — Classify articles in real-time by their content (politics, business etc.) — Use its 
own models, identifies entities (organizations, persons and locations) in real-time. 
By using the AI enhancements that will be developed by Reportbrain in financial 
services & insurance sectors, users will be able to add an extra, yet orthocanonical 
feed to their existing credit rating models that will provide a real-time understand- 
ing of the world and more specifically on what’s if happening with specific entities 
(organizations & persons) of interest. 


5.217 Machine Learning Algorithms for Health Related Data 


SILO has implemented such approaches in different health-related projects such as 
CrowdHEALT. SILO to make suggestions for the enhancement of the Platform. 


5.218 Wenalyze Big Data Analytics Platform 


Platform that collect and process information from multiple open data sources 
regarding SMEs and apply cognitive algorithm to detect risk and changes in finan- 
cial needs. The tool will be use in pilot 13. 


5.2.19 Octopush Geospatial Enabling Framework 


Octopush is a geospatial enabling framework, developed by AgroApps, allowing 
the collection, pre-processing, post-processing and distribution of geospatial data 
products and services, either referred to remote sensing (satellite, drones) acqui- 
sitions or multidimensional data outputs from numerical simulations. Octopush 
allows users to have access through a centralized access point to decentralized ser- 
vices, while Octopush SDK enables IT developers, to easily adapt or expand the 
provided geospatial services. Octopush was created by AgroApps aiming to adders 
the company/operations and services need for a modular system, independent from 
any third-party service provider (excluding those offering raw data like Copernicus, 
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NASA etc.), a framework that will be easily adapted to the market needs and fol- 
low the service-oriented business model of the company. Octopush is the baseline 
framework that addresses the AgI companies need in geospatial information, either 
through the development of new services and data models or the adaptation of the 
existing ones. Some of the services that are currently offered by Octopush are: a 
significant number of Vegetation Indices derived from Optical and SAR imagery 
and crop specific biophysical parameters including leaf area index, chlorophyll con- 
tent and above ground biomass; crop specific yield estimation; farm management 
information services such as irrigation scheduling and variable rate fertilization; 
weather-driven models of possible pests and diseases outbreaks; high resolution 
weather forecasts and specific agrometeorological parameters; crop damage assess- 
ment services. 


5.2.20 AgroApps Weather Intelligence Engine 


Weather Intelligence Engine, developed by AgroApps, is a numerical weather pre- 
diction and atmospheric data assimilation processing chain, based on the WRF 
numerical weather prediction model. Weather Intelligence Engine is producing 
operationally all the needed weather data products (Near-real-time, medium-range 
weather forecasting, subseasonal to seasonal forecasts) by AgroApps offered services. 
Weather Intelligence Service could take advantage of the available INFINITECH 
HPC resources, and pilot test a hybrid ensemble data assimilation scheme in con- 
vective scales. 


5.2.21 Sentiment Analysis Tool 


Reportbrain Sentiment analysis tool uses application programming interface (API) 
calls to search existing news article index — Elastic Search Index. The results of 
the search are processed in Reportbrain’s Sentiment Analyzer and the outcome 
is returned to the caller as a REST API response. This means that articles that 
are requested by an authorized caller are evaluated in real-time for sentiment and 
returned to the API caller. Sentiment evaluation describes sentiment as 0 for neutral, 
—1 for negative or +1 for positive. The purpose of the RB News Article Sentiment 
API is to provide the sentiment of articles selected by the user via a Query. In partic- 
ular, the RB News Article Sentiment API provides several fields, that could be used 
as filters to query on RB platform. These are: content, language and date. The user 
can retrieve the sentiment analysis to personalize portfolios of their clients, using 
valuable insights of positive/negative/neutral evaluation of news articles about the 
entity of interest. 
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5.2.22 Partitioned and Distributed Transaction Graphs 


Ethereum and bitcoin public blockchain transaction datasets. BOUN to make sug- 
gestions for the enhancement of the Platform. 


5.2.23 ALIDA: A Micro-service Based Platform for 
Composition, Deployment and Execution of BDA 
Applications 


A micro-service based platform for composition, deployment, optimization, exe- 
cution and monitoring of big data analytics workflows (covering ingestion, prepa- 
ration, analysis and visualization). It is designed and developed on top of the most 
cutting-edge open source Big Data technologies and framework. 


5.2.24 Text Analysis Tool 


Reportbrain text analysis tool generates insights from both structured, semi- 
structured and unstructured text data using natural language processing (NLP). 
Such insights include sentiment analysis, key phrases, language, and entities, among 
others. The Reportbrain Analytics Engine uses advanced parallel processing and 
combines complex NLP tasks in real- time to produce desired results. Text analysis 
tool allows companies to better understand all types of data they are interested in. 
After deep analysis of data lake performed by Text Analysis tool, user gain valu- 
able insights about entity that was used in a query. Performing the same process 
manually would require tremendous amount of effort and time. The final objec- 
tive for using the tool is that knowledge is provided, and not-relevant information 
are ignored. 


5.2.25 Blockchain Tokenization 


Hyperledger Fabric blockchain support for tokens. Enhance Hyperledger Fabric 
with tokenization capabilities for digital trading of assets. 


5.2.26 Healthentia LifeSciences BigData Platform 


BigData platform providing data sources aggregation and management, as well as 
tools for analytics and visualization. It will be used in the Io T-based Life Insurance 
pilot (ARI/RRD Lead pilot). Re-purpose the platform to support the insourance- 
related monitoring of people for pilot 12. 
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5.2.27  Event-registry 


Event Registry (ER) is real-time cross-lingual global media monitoring service for 
modeling global social dynamics (eventregistry.org) developed by the JSI. ER aggre- 
gates and analyses news content for over 120,000 news sources published globally in 
100+ languages. Events mentioned in the news are identified and relevant informa- 
tion about them is automatically extracted and stored in a searchable form. The data 
can be accessed directly on the platform or via the API. ER supports various analyt- 
ics including deep analytics of the events and correlations between events, extracted 
entities and financial data extracted from the main financial indexes. Eventregistry 
could be enriched with new insights of the potential use scenarios and would ben- 
efit from them to expand the current offering to the fintech industry. In addition, 
better insight will be given to the team in order to develop further analytics. 


5.2.28 Stream Story 


Stream-Story multi-resolution modelling and explanation of (possibly real-time) 
streaming data: (1) Exploratory data mining — A system for the analysis of multi- 
variate time series. It computes and visualizes a hierarchical Markov chain model 
which captures the qualitative behavior of the systems’ dynamics.; (2) Multi-scale 
representation — The hierarchical model allows users to interactively find suitable 
scales for interpreting the data; (3) Real-time monitoring Visualizes streaming data 
by mapping it to the hierarchical model. It can provide predictions and alarms for 
different behavior. 


5.2.29 Qminer 


OMiner is an analytics platform for large-scale real-time streams containing struc- 
tured and unstructured data. It is designed for scaling to millions of data points on 
high-end commodity hardware, providing efficient storage, retrieval and analytics 
mechanisms with real-time response. 


5.2.50 SSC - Super Stream Collider, a Multiformat Data 
Management and Query System 


The SSC enables distributed cloud- based high-performance processing of seman- 
tically linked streams i.e. it is an enabler for semantic analytics. It will be used for 
analytics over semantically unified/interoperable streams (in WP5), as well as in the 
KYC and customer- centric services pilot (BOI-led pilot). 
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5.2.51 The Global Engine with Neural Network Intelligence 
(GENNI) 


An Al-Powered Neural network-powered Engine executing DL (deep neural net- 
works) algorithms over semantically annotated streams. It can be used in pilots for 
customer centric analytics. 


5.2.52 Data Check-in Mechanism 


A sophisticated data check-in mechanism that is enabling the preparation and 
uploading of the data provider's (public or confidential) datasets in the cloud plat- 
form that is one of the results of the ICARUS H2020 project. The data check-in 
mechanism is deployed on the premises of the data provider as a stand-alone desk- 
top application and receives as input a list of data check-in jobs that incorporate a 
set of instructions with all the actions that will be performed on a specific dataset, 
residing on the local storage of running operating system, in order to enable the 
data preparation and uploading of new datasets in a secure manner. Internally, the 
mechanism handles the orchestration and execution of the designed instructions 
with the use of incorporated (micro) services for the: (a) data mapping of data 
source entities to the designed common data schema, (b) data cleaning operations 
on the data source entities, (c) the anonymization operations on the data source 
entities and (d) the encryption of the data source entities . This list of (micro) ser- 
vices is expandable based on the needs of each platform. The data check-in mech- 
anism is offered in the form of a local client for all OS (Mac, Linux, Windows) 
and is designed and developed using the latest technologies for desktop apps with 
the aim to offer end-to-end security on the data preparation and data upload tasks. 
The specific technology served as the basis for the design and implementation of 
the INFINITECH Data Collection component. 


5.2.55 loT-Catalogue 


The ‘IoT Catalogue’ is a web-based catalogue from where to pick & choose IoT 
solutions; it is an explorer of innovations in IoT applications and technologies. It 
aims to be single entry point of support to IoT developers/integrators/advisors/end- 
users in the process of identifying and selecting IoT technologies (ranging 
from complete end-to-end solutions to tools and components/parts) but also 
inspecting a wide set of IoT use-cases, their validations, associated contact 
persons/organizations, detailed characterization (value propositions, ICT prob- 
lems, functions, target, domain), supporting technological solutions, and much 
more. 
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5.2.34 Analytics Library 


In the scope of ATMOSPHERE (Adaptive, Trustworthy, Manageable, Orches- 
trated, Secure Privacy-assuring Hybrid, Ecosystem for REsilient Cloud Comput- 
ing) project, the UPRC team, focused on the delivery of the library of services, 
which can be utilized as a baseline for the INFINITECH library. (WP5). Update 
the library to include metadata relevant to security and privacy constraints of the 
INFINITECH algorithms to be made available through the library. 


5.2.35 Catalogue of Objects 


During the 5GTANGO and MATILDA EU Projects, both aiming to enable the 
flexible programmability of 5G networks and to devise and realize a radical shift in 
the development of software for 5G-ready applications, UPRC team contributed 
to the market platform and to the catalogue of services & functions, respectively. 
Therefore, the relative outcomes and the solutions developed by UPRC for these 
projects will be utilized as a baseline for the INFINITECH's marketplace, integrat- 
ing with UNP’s IoT-catalogue. Extend to support a variety of assets (e.g. datasets, 
models, etc) as well as to support composite assets (i.e. analytics pipelines). 


5.2.36 Portfolio Optimization Tool 


Privé goal is to automatically construct an optimized portfolio by modelizing the 
investment advisory and decision process using a certain level of AI procedures. 
The result is a tailored portfolio for each individual investor. All functionalities shall 
be also available via API access, which should be a kind of “FinTech-as-a-service” 
(FaaS). Our target market are all financial services intermediaries who provide advi- 
sory and wealth management services. Hence Banks, Insurers, Insurance Brokers, 
EAMs, Securities and Brokerage firms are the target customers for this solution. 


5.2.5857 Sentiment Analysis for Financial News 


This is a REST API taking as input financial news in text form and performs Sen- 
timent Analysis on the given text. Additionally, the utilised model is be retrained 
periodically retrieving historical news from LXS database. It enables classification 
of financial news according to their impact (i.e. positive, neutral, negative) on a 
given portfolio. This tool can be used in parallel with other quantitative metrics in 
order to provide a comprehensive risk assessment in the stakeholders (traders, risk 
managers etc.). 


Technologies 61 


5.2.38 Al Model for VaR Prediction 


This component is a REST API which predicts Value-at-Risk and Expected Short- 
fall of several financial Portfolios, utilising both well-established and innovative 
techniques. AI model for VaR prediction takes as input both the asset prices and the 
current trading position and derives the VaR and ES estimates at 95% and 99% 
confidence level utilizing three different models. The estimation procedures are 
repeated every minute to take into account the most recently available data provid- 
ing risk assessment in (near) real time. Our target market are the institutions which 
are exposed to market risk such as commercial and investment banks, insurance 
companies and institutional investors. 


5.2.89 INFINITECH Components Specifications 


The INFINITECH Reference Architecture (IRA) [Deliverables D2.13, D2.14], 
defines a set of layers and allows flexible workflows out of data processing modules, 
called components that performs specific data transformations. A workflow usually 
consumes one set of data (sources) to produce another set of data (destinations). 
A component can be identified by the Input/Output interfaces and the functional- 
ity (transformation) provided at the edge. In that respect, a data component can be 
considered as a black box that can be replaced by any technological implementation 
that performs at the edge the same transformation. 

Figure 5.1 depicts the different reference layers of the IRA along with examples 
of data components. Scope of the section is to identify existing and desired data 
components that will support the design, the development and deployment of the 
Pilots’ sandboxes. 

An INFINITECH solution (sandbox) can be built organizing components in a 
workflow (sometimes referred to data pipeline) to accomplish a complex transfor- 
mation from one set of data sources to another set of data solving the business case. 
Components should therefore be interoperable, with clear interfaces and perform 
a clear function over the data. The rest of this section finally describes the existing 
and to be developed components of the project. Further progress in the project will 
provide the needed details for implementation and deployment. 


5.2.40 Data Component Description 


In the following a first list of the identified components, are described using a stan- 
dard template described in the table below. 
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Figure 5.1. Logical schema of data processing components in INFINITECH. 


5.3 Data Components 


5.5] Relational Database 


This component consists in the central data repository of the platform. It enables 
transactional semantics and provides capabilities for query processing based on stan- 
dard SQL statements. It can scale out on the runtime while continuing serving 
operational workloads. It can support analytical processing in combination with 
operational data modifications with the level of isolation to be snapshot isolation. 
That is, it enables real-time business analytics. The data repository is written in Java 
and C, and provides support for JDBC, ODBC and python drivers. It runs on K8S 
cluster. 


5.3.2 Polyglot Query Processing 


This component enables the query execution over more than one datastore in seam- 
less manner. The data user can submit a single statement and let this component to 
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execute the query by pushing it down to the target database. By doing this, it can 
process the data on premise, and retrieve only the results, thus it is convenient for 
cases where data cannot be loaded the platform and they need to be accessed from 
an external datastore. This component is implemented in Java. It is an extension of 
the central repository itself. In fact, it consists of a jar binary which is loaded in the 
classpath of the query engine of the data repository. As of that, it will be deployed 
as part of the query engine via K8S cluster. 


5.5.5 Incremental Analytics 


This component enables the query execution in an incremental fashion. The data 
user will be able to submit a continuous query to the datastore, which will be con- 
tinuously and incrementally validated. This means that the initial results will be 
retrieved first, and as data arrives to the data repository, they will be validated against 
the submitted query, and if it validates the statement, it will be returned to the user. 
This component is implemented in Java. It is an extension of the central repository 
itself. In fact, it consists of a jar binary which is loaded in the class path of the 
query engine of the data repository. As of that, it will be deployed as part of the 
query engine via K8S cluster. 


5.3.4 OneHotEncoder 


Service to represent categorical variables as binary vectors. It is a BDA service reg- 
istered in the ALIDA catalogue. It is a PySpark-based micro-service running on 
K8S Spark cluster mode, working as part of the ALIDA framework. In a nutshell, 
ALIDA is a Micro-service based platform for composition, deployment, optimiza- 
tion, execution and monitoring of pipelines of Big Data Analytics (BDA) services. 
ALIDA is a result of previous research activities developed by ENG. Currently, it is 
a work in progress. ALIDA offers a catalogue of BDA services (ingestion, prepara- 
tion, analysis, visualization): user designs his own (stream/batch) pipeline by choos- 
ing the BDA services from it, indicates which Big Data set he wants to process, 
launches and monitors the execution of the pipeline and personalizes the results 
visualization by choosing from a set of available graphs, all this without worrying 
about having software developer skills or particular knowledge on big data tech- 
nologies. This service is registered in ALIDA catalogue as Spring Boot Application 
containing the python code and its dependencies. After implementing the algo- 
rithm using Pyspark, creating the Dockerfile and pushing the new image inside a 
repository, this microservice is registered into the ALIDA catalogue through the 
GUI. Source: https://home.alidalab.it/. 
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5.3.5 Stream Processor 


This component will provide streaming processing capabilities. The data user can 
declare continuous queries that will be executed over the data stream. It will also 
allow to combine streaming data with data at-rest and enable the storage of data 
streams even when injected in very high rates. This component will be based on 
Apache Flink and will be containerized to be deploy with a K8S cluster. This com- 
ponent can exploit the capabilities of the declarative real time analytics. This will 
be very useful in cases where the data user wants to calculate a value over a stream 
that will need to perform an expensive scan operation over a data table (i.e. com- 
pare the input with the overall average of a field in a table). As scan operations (and 
operations that will require a scan operation, as the average etc) have a complexity 
bigger then O(1), they are timely costly and cannot be executed in a stream. For 
that, the developer often caches that value and periodically updates the value. With 
the declarative real-time analytics, the data user is given the opportunity to declare 
such an analytical operation (i.e. the overall average) with an SQL fashion, and the 
query will be executed with a complexity of a get operation, which will allow these 
types of analytical operations to be included in a stream operation, providing to 
it the current average, with respect to data consistency and isolation in terms of 
ACID properties and transactional semantics. 


5.5.6 Online Aggregates 


This component allows for the execution of aggregate processing operators in 
an online manner. This way, the definition of the aggregate operations can be 
defined, and the result of the execution can be pre-calculated in an online man- 
ner, preserving data consistency and transactional semantics. When requested, 
the result of this execution can be retrieved with a GET operation, removing 
the need to scan the whole dataset. This component is based on the relation 
database component and extends its core storage and query engine. It runs on K8S 
cluster. 


5.5.7 Data Collection 


This component provides the data ingestion mechanism that: (a) enables the acqui- 
sition and retrieval of heterogeneous data from a variety of diverse data sources 
and data providers, (b) facilitates the data annotation of the retrieved data by 
enabling the mapping between the data entities included in the retrieved data and 
the provided by the data provider data model, (c) enables the design and execu- 
tion of data cleaning operations towards the increase of the data quality of the 
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retrieved data. This component is based on Java and Python programming lan- 
guages. Additionally, Java Spring Boot, Flask, Pandas and NumPy frameworks 
and libraries are leveraged. This component is providing a highly configurable 
mechanism capable of addressing the various connectivity and communication 
challenges raised during data ingestion. Hence, the data provider is able to config- 
ure this mechanism in order to execute data collection pipelines that include data 
retrieval, data annotation and data cleaning operations which are tailored to their 
needs. 


5.5.8 Anonymization Tool 


The anonymization tool modifies data in order to preserve privacy. It is especially 
indicated in those cases where a dataset contains personal data and it has to be 
outsourced or shared with a third party. The tool includes different anonymization 
algorithms that aim at avoiding the appearances of data combinations that could 
lead to a possible re-identification of the data subjects. It also includes a set of pri- 
vacy and utility metrics that allow to measure the risk that remains after anonymiz- 
ing the dataset, and the impact of the anonymization process on the quality of the 
data. The tool is based on two modules that can be deployed as a Docker container. 
The tool requires to retrieve and store the raw and anonymized version of the data 
from a relational database. 


5.5.9 DUOS (Digital User On-boarding Services) 


Provides remote user registration using eID or passport, dealing with virtual identi- 
ties in a mobile device. It uses various identity proofing and verification services that 
link new user eID creation (virtual or derived eID) with government issued e-ID. It 
verifies electronic data stored on chip and machine readable zone. Provides Flexible 
Multi-factor authentication for different users or identities. Different underlying 
licenses (Apache 2.0 license for MRZ reader and license to be decided for biomet- 
ric checking). 


5.3.10 DPO (Data Protection Orchestrator) 


The DPO embeds and automats the assurance of security and privacy by design 
and by default in complex business flows. It provides orchestration of Privacy 
Enhancing Technologies and related services using BPM tools in order to inte- 
grate privacy or data protection perspectives into business processes. It can orches- 
trate any kind of REST service. The DPO interacts with the privacy expert that 
prepares the business flow in a BPMN file. The flow interacts with PETS such as 
anonymization. 


66 INFINITECH Technologies, Data and Processes 


5.3.11 Blockchain Reader 


Fetches requested data from blockchain ledger. The specific component is part of a 
Blockchain chaincode. As the chaincode is tightly connected to the business oper- 
ation that is performed on top of different business objects, different flavors of the 
chaincode exist depending on the business use cases (Consent Management, Know 
Your Customer/Know Your Business, Asset Tokenization). 


5.3.12 Blockchain Writer 


Submits transactions on the blockchain ledger. The specific component is part of a 
Blockchain chaincode. As the chaincode is tightly connected to the business opera- 
tion that is performed on top of different business objects, different flavours of the 
chaincode exist depending on the business use cases (Consent Management, Know 
Your Customer/Know Your Business, Asset Tokenization). 


5.313 Smart Contract Executor 


Executes smart contracts on the blockchain ledger. The specific component is part 
of a Blockchain chaincode. As the chaincode is tightly connected to the business 
operation that is performed on top of different business objects, different flavours 
of the chaincode exist depending on the business use cases (Consent Management, 
Know Your Customer/Know Your Business, Asset Tokenization). 


5.514 Blockchain Data Visualizer 


Queries and displays information about blocks, transactions, chaincodes and trans- 
action families, network name, status and nodes list, organizations list and peers 
list. The specific component is part of a Blockchain chaincode. As the chaincode is 
tightly connected to the business operation that is performed on top of different 
business objects, different flavours of the chaincode exist depending on the busi- 
ness use cases (Consent Management, Know Your Customer/Know Your Business, 
Asset Tokenization). 


5.3.15 Blockchain Authenticator 


Grants access to specific channel(s) of the blockchain network.The specific com- 
ponent is part of a Blockchain chaincode. As the chaincode is tightly connected to 
the business operation that is performed on top of different business objects, dif- 
ferent flavours of the chaincode exist depending on the business use cases (Consent 
Management, Know Your Customer/Know Your Business, Asset Tokenization). 


Data Components 67 


5.3.16 Blockchain Encryptor 


Encrypts the clients’ sensitive data within the smart contract using AES256. The 
specific component is part of a Blockchain chaincode. As the chaincode is tightly 
connected to the business operation that is performed on top of different business 
objects, different flavours of the chaincode exist depending on the business use 
cases (Consent Management, Know Your Customer/Know Your Business, Asset 
Tokenization). 


5.3.17 Blockchain Decryptor 


Decrypts data retrieved from the blockchain ledger. 


5.3.18 Blockchain Transaction Dataset Preparation Component 


This component will be responsible for retrieving raw transaction blocks from the 
Bitcoin and Ethereum blockchains and parsing the blocks in order to extract Bit- 
coin, Ethereum and major token transactions. After retrieving all the blocks up 
until now, this component will be run periodically (e.g. once a week or as needed) 
to retrieve newly generated blockchain blocks during the period. Blockchain blocks 
must be retrieved from either nodes attached to the blockchain (e.g. running Parity) 
or from blockchain data supplier gateways (Google, Infura, Cloudflare). Web3.js 
Ethereum Javascript API and Consensys abi-decoder for smart contract call data 
parsing. 


5.3.19 Scalable Transaction Graph Analysis Component 


This component will be responsible for taking massive bitcoin and ethereum pub- 
lic transaction data. Since transaction graph size massive and growing, it will use 
parallel algorithms to achieve scalability. It will utilize graph and machine learning 
algorithms to analyse fraudulent transactions. Parallel processing (high performance 
computing) technologies. Message Passing Interface (MPI). Graph Partitioning 
Software such as Scotch and/or Metis. Distributed graph algorithms. Machine 
Learning. 


5.5.20 Semantic Streams Analyzer 


Semantic Streams Analyzer Middleware-Engine — SSSAME, The SeSAME compo- 
nent is a data mashup builder for the financial sector that can be used as a data pro- 
cessing component for your data management application, it enhances the capacity 
to process financial and insurance data in the form of batches and provides a single 
output, it is ideal when multiple sources have different data formats, it is built to be 
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compatible with the most common data formats in the financial and Insurance sec- 
tor i.e. FIBO, FIGI and LKIF and additionally it uses INFINITECH Core Graph 
Data Model to enhance performance. The SeSAME component is designed as 
a dataflow/workflow execution framework connecting various data input/outputs 
through the concept of pipelines for creating the data mashup. Conceptually, each 
financial operator has input data or streams and SeSAME provides one output data 
or stream. The multiple inputs can be used simultaneously while a single out- 
put in RDF is provided. Only the final operator of a workflow can return a for- 
mat other than RDE if necessary by defining and transforming the data into the 
desired format. The data Operators can be of three modes via APIs: API (1) a data 
acquisition operator is used to collect or receive data from data sources or gate- 
ways and can be pull-based or push-based. API (2) a stream processing operator 
defines stream processing functionalities in a declarative language, e.g., CQELS. 
API (3) a streaming operator streams the outputs of the final operator of a work- 
flow to the consuming applications. In these three API modes operators of the data 
transformations and alignments can be done to produce a normalised RDF output 
format. 


5.3.21 Semantic Reasoner 


Enhanced Distributed Reasoner over FinTech Ontologies - EnDoRFIN Seman- 
tic Reasoner, The EnDoRFIN component is a tool for inferring knowledge from 
data streams, it uses some rules as conditions for defining logics conditions and 
as a result logical consequences are provided as outcomes. The inference rules are 
defined based on the most commonly used financial and insurance vocabularies i.e. 
FIBO, FIGI and LKIF and the way to process the rules is using APIs for defining 
the logical descriptions for the data applications it is introduced. This component 
allow the use of other languages but need to be upgraded to the target vocabulary 
and additionally the EnDoRFIN uses INFINITECH Core Graph Data Model to 
ensure the inference is applicable to all the involved domains in FINTECHs. 


5.5.22 Ontology Mapping 


INFINITECH Graph Data Model — Online Ontology Mapping Framework and 
Toolkit, The INFINITECH Graph Data Model is the set of online tools refer- 
ring to the graphs, formats, vocabularies and ontologies used in the INFINITECH 
project. The INFINITECH Graph Data Model is provided in the form of a set of 
online accessible files, schemas and metadata model diagrams that represent the way 
the INFINITECH data can be organised and structured, it also contains the meta- 
data in two different formats, json-ld and owl. The Ontology Mapping Ontologies 
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section contains online machine-readable files both in OWL and JSON-LD format 
for online accessibility, both files are maintained and updated regularly to keep the 
latest version of the ontology files up to date using a versioning method. 


5.3.25 Semantic Annotator-Preprocessing 


Semantic Annotator-Middleware Preprocessing Layer for FinTechs - SAMPLe- 
FIN, The INFINITECH SAMPLe-Fin is the support online tool for transforming 
datasets into RDF- compatible format, beside the online available tool, a set of doc- 
umentation is provided providing the necessary steps to transform data sets from 
any data-exchange format i.e. CSV, XLS, etc into RDE. This tool is provided as 
enabler for a semantic layer where enriched data can be processed more efficiently. 
INFINITECH SAMPLe-Fin is the mechanism INFINITECH uses for address- 
ing cross domain and cross pilot Data Interoperability and Data Exchange and 
it also provides the pre-processing layer for the interoperability requirements in 
INFINITECH project. 


5.5.24 Smart Fleet (loT Context Management and Historical 
Data Component) 


A FIWARE-Based framework designed to capture, homogenise, process and dis- 
tribute real time traffic and smart vehicle’s information (it will also allow other 
related context information). It will implement Pub/Sub mechanisms and support 
Geolocation and Time series tools. Additional tools to build custom dashboards will 
be included. Customised technological building blocks based on FIWARE architec- 
ture:Context Broker (Orion) [NGSI & ETSI NGSI-LD]: distribute context data 
between any possible combination of data-producers and data-consumersHistorical 
Data (Quantumleap) [ETSI NGSI-LD]: persistence of the data managed by the 
Context Broker. 


5.35.25 Fraud Detection Service Training 


This component trains the corresponding Machine Learning model that will assign 
a drivers profile and helps to identify the drivers behaviour using the provided 
driving routes and vehicle’s technical data. 


5.3.26 Fraud Detection Service Execution 


This component provides the drivers’ profile and classification (behaviour) for the 
last given driving route. 
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5.3.27 Pay As You Drive Service Training 


This component trains the Machine Learning model that will classify the drivers’ 
behavior while driving, according to the data collected from the driven cars (driving 
routes). 


5.3.28 Pay As You Drive Service Execution 


This component will classify the drivers’ behaviour while driving according to the 
data collected from their car and exploiting the Driver Classyfier/Driving Profiling 
ML Models. 


5.5.29 Investment Recommendation Engine Training 


This component trains the personalized Investment Recommendation engine, to 
provide a set of recommendations for the financial instruments categories suitable 
for each customer and his/her investment profile, based on Market Index & Finan- 
cial Instruments Sentiment data and Customer Risk Profiling ML Models. Machine 
Learning Models in Python mainly based on scikit-learn. 


5.5.50 Investment Recommendation Engine Execution 


Financial Instruments Personalized Investment Recommendation engine suitable 
for each customer and his/her investment risk profile, based on Market Index & 
Financial Instruments Sentiment data. 


5.5.51] Recommender 


Generates an actionable insight depending on the output from the remaining com- 
ponents within the Data Analytics layer, i. e. components P5b Analytics 02 to 
P5b Analytics 08. 


5.5.52 Cash Flow Prediction 


ML/AI model used to indicate and predict the available working capital (oper- 
ating cash flow) of the SME (AS-IS & near term future).Alerts/notifications (via 
BOC Middleware PushNotifications) to be pushed to the respective SME in case 
of potential lack of liquidity and/or balance moving below a threshold.Cashflow 
Data/Insights to be provided to BOC Middleware Mobile/Web BPEIn order to 
provide valuable insights to the SME the data should be collected and streamed 
in real (or near real) time, since whenever a new object/ entry appears the model 
should retrain and adapt. 
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5.3.53 Budget Prediction 


AI (ML) model is used to support the budget target setting for the various cate- 
gories used by the respective SME. Doing so by providing budget predictions for 
each utilized category. The underlying model will take into consideration the cash 
flow analysis output, benchmark, macroeconomic and other available SME data 
(Business Plan). 


5.5.54 KPI Engine 


The KPI engine calculates KPIs with regards to the Financial Health and Perfor- 
mance of the respective SME.Doing so by taking into consideration the respective 
SME profile (e.g. maturity stage), accounting-wise optimal KPI values and how 
other similar SMEs perform. 


5.3.55 Transaction (Txn) Monitoring 


A dynamic complex event processing (CEP) mechanism that monitors the transac- 
tions of the user. In case transaction amount or type deviates from normal behaviour 
the user will be informed of abnormal transactions in order to be safeguarded from 
double payment mistakes and potential fraud attempts.In addition, expense pat- 
tern are also analysed to identify potential savings for instance multiple subscription 
spending or high ATM fees. 


5.5.56 Transaction (Txn) Categorization 


Smart transaction auto-classification which would also allow the user to manually 
override the given transaction category and define a new one (re-classify). The cat- 
egorization performed will be based on the needs of the individual SME. 


5.5.57 Invoice Processing 


Processes transaction data and ERP data in order to keep record of the invoices 
which have been partially or fully paid by the SME. Provides insights regarding the 
respective VAT amount payable and the optimization of cash flow by providing 
background info on invoices such as paying a invoice at the “right” time.In cases 
where available invoice data is limited the engine will utilize a simplified approach 
in order to derive the expected VAT amount to be paid by the SME at the next 
VAT due date. Doing so by utilizing the banks transaction info or/and past VAT 
payment amounts. 
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5.5.58 KMeans 


Batch BDA Service: Given a set of observations (x1, x2, ..., xn), where each observa- 
tion is a d-dimensional real vector, k-means clustering aims to partition the n obser- 
vations into k (<n) sets S = {S1, S2, ..., Sk} so as to minimize the within-cluster 
variance. It is a BDA service registered in the ALIDA catalogue. It is a PySpark- 
based micro-service running on K8S Spark cluster mode, working as part of the 
ALIDA framework. 


5.5.59 Random Forest (Model) 


Batch BDA Service: An ensemble learning method for classification, regression and 
other tasks that operate by constructing a multitude of decision trees at training 
time and outputting the class that is the mode of the classes (classification) or mean 
prediction (regression) of the individual trees. Random decision forests correct for 
decision trees’ habit of overfitting to their training set. It is a BDA service registered 
in the ALIDA catalogue. It is a PySpark-based micro-service running on K8S Spark 
cluster mode, working as part of the ALIDA framework. 


5.3.40 Random Forest (Predict) 


Streaming random forests algorithm. It will be a BDA service registered in the 
ALIDA catalogue. It is a PySpark-based micro-service running on K8S Spark cluster 
mode, working as part of the ALIDA framework. 


5.3.41 Client Contextual Information 

The component generates and updates a relevant client contextual information 
related to the clients’ data and behavior. Open source AI/ML frameworks. 
5.5.42 Financial Fraud/Crime Risk Score 

Using the clients’ contextual information and transactions a risk score of a fraudu- 
lent request of an online instant loan is evaluated. 


5.5.45 Anomaly Analysis 


Anomaly Analysis provides two main functionalities: e Anomaly detection 
e Anomaly prediction for time series data. 
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5.3.44 Pattern Analysis 


Pattern Analysis provides two main functionalities:: Pattern matching. Discovery- 
The component will provide support for detection of complex patterns on data 
graphs. 


5.3.45 Stream Story 


Stream Story is a component for the analysis of multivariate time series. It computes 
and visualizes a hierarchical Markov chain model which captures the qualitative 
behaviour of the systems’ dynamics, where system is described with a group of 
timeseries. 


5.5.46 Open API Gateway 


This component provides the single point of entry for the added-value functionali- 
ties (such as the Machine Learning (ML)/Deep Learning (DL) analytics functional- 
ities) of INFINITECH which are based on microservices. The specific component 
enables the discovery and invocation of the dynamically registered microservices, 
effectively handling the incoming requests towards these microservice instances. 


5.5.47 User Interface for Blockchain Transaction Reports and 
Visualization Component 


This component will be responsible for providing user interaction with the Scalable 
Transaction Graph Analysis component within the bank and collect/manage user as 
well as annotated blacklisted blockchain addresses . It will utilize OpenAPIs (REST 
APIs) to submit queries consisting of customer blockchain addresses and blacklists 
to transaction graph analysis component and generate web based reports and visual- 
ization based received results. OpenAPIs/REST APIs, Web servers, Javascript, Vis.js 
graph drawing library (community version) (https://github.com/visjs-community 
/visjs-network). 


5.3.48 Visualization Preparation 


Stream BDA Service: Service to prepare data to the visualization depending on 
the type of incoming data or the data you want to view. It will be a microservice 
belonging to the ALIDA core. 
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5.5.49 Real Time Visualization 


Useful tool for displaying charts through web application. Microservices deployed 
through Kubernetes e Docker e Data Source Connectors. 


5.5.50 INFINISTORE 


This isa GENERIC DATA STORE implementation for the INFINITECH Project 
as a microservice on top of anoSQL DB (MongoDB) instance. It is fed by different 
data ingestions servers and supports all other services. Microservice wrap on top of 
MongoDB. The microservice is implemented as a python-flask web server applica- 
tion on top of MongoDB instance. The API are defined with SWAGGER Open 
Api 3.0. 


5.5.51 UI Risk Assessment Based on VaR 

Web application to monitor portfolio risk in real time, perform what-if analysis, 
providing also several statistics of the underlying financial assets. 

5.5.52 Pseudo-anonymization Tool 

A tool that pseudo-anonymize data in order to preserve privacy. The component 
needs a specific configuration/development for each pilot in which it is used. 
5.5.55 Health Insurance Risk Assessment Service 


Algorithm yielding risk based on user RWD and a pre-trained classifier. Imple- 
mented as a Python script. Currently the assessment classifiers are: Random Forests 
or Logistic regression (using scikit- learn for inference), or neural networks (using 
Tensorflow for inference). 


5.3.54 Health Insurance Fraud Detection Service 


Algorithm detecting fraudulent behavior of insourance company customers. Imple- 
mented as a Python script. 


5.3.55 Well-being Outlook Classifiers 


Classifiers to be used by the health insurance risk assessment algorithm. The current 
set includes Random Forest, Logistic Regression and Neural Network classifiers of 
varying complexity. The format of the classifier depends on its type. 
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5.3.56 Synthetic RWD for Well-being Analytics 


Synthetic data of the same format as those collected in Pilot 12. The current version 
of the data span 1,000 people simulated for 116 weeks each. 


5.5.57 Open Banking Agreggator Solution 


Crowdpolicy Open Banking Agreggator Solution is a modular architecture (UIs, 
connectors & APIs) platform so that it can be integrated into web/mobile bank- 
ing applications, by the existing provider of the Bank in the form of API integra- 
tion, but also as a separate application that could be made available to the users 
of Bank’s online services Compatibility with best known market standards based 
on the European PSD2 Directive (Berlin Group, Open Banking UK, STET) Sup- 
port for PISP & AISP services based on the PSD2 European Directive: — Payment 
Initiation Services — Account information Services. 


5.3.58 Big Data Analytics Platform 


Platform that collect and process information from multiple open data sources 
regarding SMEs and apply congnitive algorithm to detect risk and changes in finan- 
cial needs. The tool will be use in pilot 13. 


5.4 Processes 


5.41 Semantics Streams Analytics Engine (SeSA-ME) 


The INFINITECH Semantics Streams Analytics Engine (SeSA-ME) and the 
related tools for enabling semantic data exchange is based on the development of 
an interoperability (ontology-based) database/registry supporting linking of diverse 
systems and datasets based on shared semantics, as well as semantically interoperable 
analytics. The SeSA-ME system includes tools along with a visual SPARQL query 
editor providing Swagger APIs for verification and visualization tools for novice 
users while supporting full access and control over the data mashups for expert 
users. Tied with the development of the SeSA-ME platform is the development 
and deployment of the INFINITECH Graph Data Model which enables the sup- 
port for both the design and deployment of stream-based web applications in a very 
simple and intuitive way and the analytics services using stream-based applications 
and services. [D4.6] 

Semantic Streams Analyzer Middleware-Engine - SSSAME, The SeSAME com- 
ponent is a data mashup builder for the financial sector that can be used as a 
data processing component for your data management application, it enhances the 
capacity to process financial and insurance data in the form of batches and provides 
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a single output, it is ideal when multiple sources have different data formats, it is 
built to be compatible with the most common data formats in the financial and 
Insurance sector i.e. FIBO, FIGI and LKIF and additionally it uses INFINITECH 
Core Graph Data Model to enhance performance. The SSSAME component is 
designed as a dataflow/workflow execution framework connecting various data 
input/outputs through the concept of pipelines for creating the data mashup. Con- 
ceptually, each financial operator has input data or streams and SeSAME provides 
one output data or stream. The multiple inputs can be used simultaneously while a 
single output in RDF is provided. Only the final operator of a workflow can return a 
format other than RDE if necessary by defining and transforming the data into the 
desired format. The data Operators can be of three modes via APIs: API (1) a data 
acquisition operator is used to collect or receive data from data sources or gateways 
and can be pull-based or push-based. API (2) a stream processing operator defines 
stream processing functionalities in a declarative language, e.g., CQELS. API (3) a 
streaming operator streams the outputs of the final operator of a workflow to the 
consuming applications. In these three API modes operators of the data transfor- 
mations and alignments can be done to produce a normalised RDF output format. 
[D4.4 to D4.6] 

Enhanced Distributed Reasoner over Fin Tech Ontologies - EnDoRFIN Seman- 
tic Reasoner, The EnDoRFIN component is a tool for inferring knowledge from 
data streams, it uses some rules as conditions for defining logics conditions and 
as a result logical consequences are provided as outcomes. The inference rules are 
defined based on the most commonly used financial and insurance vocabularies i.e. 
FIBO, FIGI and LKIF and the way to process the rules is using APIs for defining 
the logical descriptions for the data applications it is introduced. This component 
allow the use of other languages but need to be upgraded to the target vocabulary 
and additionally the EnDoRFIN uses INFINITECH Core Graph Data Model 
to ensure the inference is applicable to all the involved domains in FINTECHs. 
[D4-4 to D4.6] 


542 CI/CD 


The “INFINITECH Way Foundations" summarizes ways in which partners of 
the project organize artefacts in the project's code repository, and make use of the 
CI/CD pipelines to i) make their solutions available to other partners and ii) auto- 
mate the deployment of a pilot. [Blueprint guidelines for the INFINITECH Way 
Foundations deployments of project pilots and technologies] 

It makes use ofan imaginary pilot solution that has been implemented to validate 
the process and can be used as a reference for other pilots to help them build their 
solutions. This reference pilot makes use of tow INFINITECH building blocks, 
the infinistore and the lx-kafka, and a micro-service that has been implemented 
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specific for the needs of that pilot. In this document, we will see how we organize 
the code and artefacts: platform building blocks which are artefacts provided by 
INFINITECH, must go under the corresponding groups in order to be available 
for all pilot solutions, while pilot specific solutions must go under separate groups. 
We provide guidelines on how to create new projects under specific groups, setup 
the Dockerfile that instructs the CI process on how to build the image, and how to 
define the CI process itself, by making use of a Jenkins file. Finally, at the last step, 
we provide guidelines on how to make use of the CD process to define the arte- 
facts that the solution consists of and how to automate the deployments. Blueprint 
guidelines for the INFINITECH Way Foundations deployments of project pilots 
and technologies. 

INFINITECH Way Foundations can be used to setup the Dockerfile that 
instructs the CI process on how to build the image, and how to define the CI 
process itself, by making use of a Jenkins file. Finally, at the last step, it provides 
guidelines on how to make use of the CD process to define the artefacts that the 
solution consists of and how to automate the deployments. On the contrary with 
the CI pipelines, the CD pipelines are not triggered automatically. The reason is 
that we cannot afford to deploy all 16 Pilots solutions in the AWS resources. Due 
to this, the deployment will be started manually, for validation purposes. Blueprint 
guidelines for the INFINITECH Way Foundations deployments of project pilots 
and technologies. 


5.4.5 KYC/KYB 


The Know Your Customer (KYC)/Know Your Business (KYB) policies in state-of- 
the-art financial relations expect that customer parties, either individuals or entire 
corporations, endeavour verification of their identity. Thus, each financial organi- 
zation is able to estimate the risks involved with sustaining a new business-customer 
partnership. In this context, together with the wider Finance field of Anti-Money 
Laundering (AML) procedures, every financial institution establishes KYC and 
KYB operations at the time they register a new customer. [D4.8] 

The KYC/KYB blockchain application resolves the implementation of such 
industry mechanisms by utilizing blockchain technology as a basic background 
infrastructure. Security, immutability and controlled transparency rule inside large 
enterprise blockchain networks while they offer efficiency of tasks and effectiveness 
of transactions within the corporation and its members. Ultimately, blockchain 
technology simplifies the emerging use cases and directly addresses any possible 
issues that are created. [D4.7 to D4.9] 

Particularly, à KYC/KYB mechanism ensures that the identification and veri- 
fication of a customer occurs against national and international regulations and 
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laws set by governments, commissions, central banks and financial associations. As 
both the customer profile information and the relevant laws and rules are subject 
to changes over time, their update and maintenance become complicated. More- 
over, their centralized systems are exposed to data protection and cyber-security 
risks, which become cheaper to launch while they are led by more sophisticated 
adversaries year by year. [D4.7 to D4.9] 

Blockchain technology and particularly permissioned blockchain networks are 
capable of providing security to the KYC and KYB processes through decentraliza- 
tion. The concept of decentralization mainly exploits the idea that the information 
is replicated across all network nodes, while sabotaging one or more nodes cannot 
harm the information integrity and a single point of failure is avoided. In particular, 
the permissioned blockchain technology promises to keep that sensitive informa- 
tion inside a private network where only privileged parties can access it with an 
insider invitation. Thus, the customer information is kept safe on a private ledger 
that offers transparency to a privileged group of legal network participants. Both 
the customer and the organization are able to perform create, read, write, delete 
(CRUD) operations on the data under pre-defined access control policies. The var- 
ious features of permissioned blockchains enable different policies applications that 
are able to, for instance, separate legal parties into a higher privacy network run- 
ning inside the initial private one. Improved privacy control and data immutability 
rule inside the aforementioned technological scenario while they ensure legitimate 
customer data protection and management together with proper administration of 
this data by financial enterprises. [D4.7 to D4.9] 

Figure 5.2 depicts High-level architecture of the KYC/KYB solution. The 
KYC/KYB system is related to different stakeholders including customers, financial 
organizations, and financial institutions. It is ultimately connected with personal- 
ized blockchain network. 


5.5 Self-assessment 


'The INFINITECH Readiness Level (IRL) is a self-assessment method created to 
evaluate and provide guidance during the adoption process of generic or specific 
technology assets and/or technology conditions related to the deployment phase 
of a pilot project. IRL's main objective is to act as a tool for project pilot lead- 
ers to help them and guide them in their process to identify and select ready-to- 
go technologies, in particular IRL also is useful in the on-boarding process of the 
INFINITECH Way Foundations where technologies that have been developed in 
advance can be adopted at any stage duration the execution of the project pilots. 
[Innovation Readiness Assessment] 
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Figure 5.2. High-level architecture of the KYC/KYB solution. 


IRL is applicable to any program (i.e. INFINITECH pilot) at any stage that 
operates or want to make use of the technology assets that are developed or imple- 
mented for the financial and insurance sectors, the IRL defines 5 levels scored 1 to 5 
where 1 is the lowest level and 5 is the highest level. IRL can be used directly in the 
INFINITECH Pentagon (i.e. kiviat diagram) as it can be mapped directly within 
the innovation process/radar facilitating in this way the path towards achieve high 
levels of Innovation. [Innovation Readiness Assessment] 

IRL provides to any project (INFINITECH pilot) the capacity to self-assess the 
level of adoption following identified technological characteristics in different tech- 
nology domain areas i.e. Data Modeling and Data Interoperability, Infrastructure 
Deployment and Services Platform Adoption, Information Management and Ana- 
lytics and Intelligent Applications. The technology domain areas are defined by 
following the requirements from ICT experts into a full-stack implementation and 
the IRL method allows the self-assessment of the evolution of the pilot project 
during the phase of adoption or onboarding process. IRL also adds the temporal 
dimension to an onboarding process which helps to not only self-assess the way the 
project is being implemented but also the validity of the process for following that 
path. [Innovation Readiness Assessment] 

Figure 5.3 shows the INFINITECH Readiness Level (IRL) Self-Assessment 
Chart. We describe its components in below. 


* IRL Level 1 — Look after the identification and definition of basic conditions 
to start a pilot with as much as possible re-use of INFINITECH baseline 
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Parameters Level 1 Level 2 Level 3 Level 4 Level 5 
Data Vocabulary Taxonomy Data Model Validated Data Graph 
Modeling Identified Ready Logic & Data Schema Ready 
Physical 
Data & Inter- Data Set Data Storage Query Data Cross-Domain Data Sharing/ 
operability Sample Ready Deployed Tests Query Exchange 
Performed 
Trustworthiness Data Access Identity Platform Self-Sovereign 
Security & Protection Control Tools Management Access Control identity 
Privacy Methods 
Infrastructure Local Host Client-Server Cloud Docker Ready Scale Up 
Mode Environment Tested 
Services Communication Management Continuous Services using Infinitech 
Platform Services using Services Monitoring Infinitech Orchestration 
Service APIs Tools Orchestration and DevOps 
Methods Compliance 
Applications Use Cases Prototype Demonstrator OnlineServices Marketplace 
with videos Ready 


Figure 5.3. INFINITECH readiness level (IRL) self-assessment chart. 


technologies. As a first step other technologies outside INFINITECH Ecosys- 
tem may be used but with a clear design perspective for on-boarding 
INFINITECH assets in next iteration. 

IRL Level 2 — Ready to go technologies for a prototype building in Client- 
Server model, a series of demonstrators can be available to test/proof the 
deployment of basic functionalities, at this level integration is not required 
but is recommended to have data model integration and common access con- 
trol tools. 

IRL Level 3 — First Deployed Demonstrator with all functionalities in Cloud- 
based deployment, the use of INFINITECH Way Foundations and Refer- 
ence Architecture is the core of this level, the use of demonstrators is essential 
to explain high level use cases and also the use of Data sets following the 
INFINITECH Data model or other best practices/standards in the particu- 
lar domain of the pilot demonstration. 

IRL Level 4 — Cross Domain services deployed with DevOps ready for easy 
deployment. Data is required to be shared across different domains, it can 
be tested with simple applications or integrated in the cross-domain query 
system. At this level all the components are integrated following DevOps 
Techniques and the orchestration methods described in the INFINITECH 
Way Foundations. 

IRL Level 5 - Implemented Proof of Concept (PoC) with interoperable cross- 
domain services, DevOps and Sandboxes deployments are featured to facil- 
itate the inclusion of the assets and solutions in the INFINITECH market- 
place. 
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INFINITECH Way Foundations Impact 
on Fintech and Insurance 


6.1 INFINITECH Way Foundations Impact on Fintech and 
Insurance 


The financial and insurance sector have not yet an adopted/accepted unified way 
of accessing & querying vast amounts of structured, unstructured, and semi- 
structured data. It is envisioning that a semantic approach can increase data inter- 
operability and improve the OLTP (On-Line Transactional Processing) databases, 
OLAP (On-line Analytical Processing) databases and data warehouse which will 
reflect a potential benefit in the financial sector. The new technologies are having 
positive impact in all industries, and the Fin Tech's are not an exception, the effort 
and cost that is associated to finance and banking services with the development of 
BigData analytics and AI systems is compensated with the number of opportunities 
and economic benefits. [D4.4 to D4.7] 

In recent years, the convergence of Internet technologies for communication, 
computation and storage networks and services has been a clear trend in the Infor- 
mation and Communications Technology (ICT) domain, beyond the fact that data 
fragmentation is an issue, there is also a lack of data interoperability across diverse 
datasets that can be reduced by using semantic technologies, however semantics 
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can be used to alleviate this concurrent issue by using the semantic descriptions 
that refer to the same data entities with similar (yet different) semantics as the way 
to improve interoperability. Currently there is an increasing production of finan- 
cial data and likewise an increase on the demand for such Information and in the 
other hand there is also a growing production of data coming from financial sec- 
tors, growing exponentially the number of sources of information, and thus it is 
necessary tools and systems that allows and facilitate that financial information can 
be accessed and integrated in a systematic, standardised, and cost-efficient manner. 
[D4.4 to D4.7] 

Semantic web technologies are taking more relevance in the financial sector and 
systems where the information needs to be shared making the information read- 
ily useful for solving many scalability issues. Consequently, remarkable efforts have 
been invested to enable data interoperability, so that pieces of data can be plugged 
in into the data infrastructures, directly exposing their own data semantics instead 
of using the data itself, facilitating exchange services. By introducing semantic tech- 
nologies, INFINITECH project provides an overlay that is much easier to process 
and at the same time minimize the risk on processing data. This semantic layer 
approach constitutes also the first step of the INFINITECH pipeline, i.e., gath- 
ering semantically annotated data from provided and/or available datasets or data 
streams. In this deliverable, we have described how INFINITECH project would 
benefit from semantic technologies like Linked Data and ontologies as the best 
practices in the semantic interoperability building process. [D4.4 to D4.7] 

Following semantic best practices, we have design and implemented the Seman- 
tic Stream Analytics Middleware-Engine (SeSA-ME) analyzed the already exist- 
ing ontologies that are related to the finance and insurance sectors and that can 
be reused for our purposes in the INFINITECH project. The main ontologies 
which are going to be used as baselines are FIBO, FIGI and LKIE because they 
focused on both financial sector and financial operations containing the baseline 
for the metadata that represent, cross-domain and intra domain, financial transac- 
tions, and operations with an attached effort towards standardization. The INFIN- 
TECH Core ontology is an extension generated in the project that describes cross- 
domain vocabularies that are used in multi-domains within the INFINITECH 
project domain areas, it is meant to be complemented by other domain specific 
vocabularies. For this reason, and according to the initial requirements of the 
INFINITECH project, other vocabularies specifically related to security and pay- 
ments are presented. [D4.4 to D4.7] 

The project has successfully designed and used a holistic framework called 
INFINITECH Business Approach or “INFINITECH Way Foundations” to 
navigate through the design, development, and deployment of technology solu- 
tions in the financial services and insurance sectors. 
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The project develops and validates a rich set of novel BigData and AI systems 


including BigData architecture, Data store, Machine Learning (ML) algorithms, 


anonymization and digital on-boarding technologies, decentralized systems for dig- 


ital finance, semantic interoperability technologies, etc. 


In terms of foundation: 


The INFINITECH reference architecture (RA) has successfully been com- 
pleted and shown how it applies to the 16 Pilots in development. 

The INFINISTORE database design has been completed providing a unified 
and integrated framework for handling real-time and offline data. It comple- 
ments well the RA. 

Significant work has been completed with respect to data governance, han- 
dling, cleansing, pre-processing, and anonymization. 

A framework for semantic interoperability and data exchange across plat- 
forms, services and datasets. 

The library of ML algorithms has been extended, including a finalized version 
of a federated learning algorithm. 

The specification and prototype implementation of four blockchain proto- 
types for digital financial applications. 

Existing technologies (Flink, Calcite, Kafka, Etherum, Amazon WS, etc.) 
have been successfully leveraged to address the specific challenges faced in 
the banking and insurance sectors. 


In terms of experimentation: 


Multiple sandboxes have been developed and implemented allowing to test 
and experiment with 16 Pilots and various assets made available through the 
INFINITECH marketplace. 

The virtual digital innovation hub (VDIH) has been integrated into the mar- 
ketplace and effectively launched. 

The INFINITECH marketplace has been revised and the web presence 
updated, including numerous courses, workshops, and record webinars. 


In terms of exploitation: 


First drafts of business models have been developed for each of the 16 Pilots 
(excluding the discontinued pilot 1) and the pilots evaluated by stakeholders 
relying on the INFINITECH Pentagram. 

Work has been performed in identifying synergies among consortium mem- 
bers by introducing the Joint Exploitation. 
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In term of framework 


* For each of the pilots’ lessons learned have been identified as a continuous 
quality improvement. 


Also, the technology developments have an impact on the implicit training of 
different actors: 


* Business users that have been training in the statistical nature of the results 
coming from such tools, and therefore the interaction and cooperation 
human-Al tools. 

* Business managers to assess the impact and specificity of the use of such tech- 
nologies. 

* Data scientists and software architects and developers that have been trained 
during the development of the present project. 


Frauds on financial services are an ever-increasing phenomena and cybercrime 
generates multi-million revenues, therefore even a small improvement in fraud 
detection rates would generate significant savings. This viewpoint, built on infor- 
mation sharing activities currently running in the banking sector, is also reinforced 
and strengthened by trusted industry reports.8910. With some surveys and reports 
pointing to issues, such as: “recover less than 25 percent of fraud losses”, “Increase 
fraud typologies globally, from recent years, include identity theft and account 
takeover, cyber-attack, card not present fraud and authorized push payment scams”, 
“6 is the average number of frauds reported per company studied”, “56% asked 
companies conducted an investigation into their worst fraud incident. many organ- 
isations are failing to respond effectively”. These, and other issues in these reports, 
demonstrate the importance of developing new technologies and approaches, such 
as real time analytics, to enhance the need of fighting against cyber frauds. [pilot 
#10] 

Figure 6.1 illustrates the INFINITECH impact creation roadmap. It started with 
initial technical development and valuation. Followed by updated development and 
initial market valuation. Next final INFINITECH results, and market lunch was 
presented. Last, sustainability and INFINITECH wider use was kicked off. 

Figure 6.2 shows the impact creation vehicles and dimensions. There are 
industrial, open source community, research and scientific, market and commer- 
cial, banks and financial institutions impacts. The impacts generated through 
INFINITECH solutions and technologies, INFINITECH VDIH, INFINITECH 
community and INFINITECH exploitation stories. 
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Figure 6.2. Impact creation vehicles and dimensions. 
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Figure 6.3. INFINITECH market impact. 


Figure 6.3 summarizes the INFNITECH market impact. INFINITECH has 


impact on banks and financial organization counterparts in INFINITECH project 


since it is used in more than fifteen banks and financial institutions (pilots). The 
INFINITECH project results were presented to more than ten banks and financial 
organizations outside INFINITECH. INFINITECH assets used by more than ten 


fintechs and small and medium size companies (SMEs). 


INFINITECH has certain industrial impacts (as shown in Figure 6.4). There 


are nine testbeds that support the INFINITECH Way Foundations including 
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INFINITECH Industrial Impact 


(8*1) Testbeds Supporting the INFINITECH Way (incl. 


INFINITECH-RA) 


* Providing the means for rapid product development in the light of changing 
and heterogeneous requirements 


Mature Innovative BigData Solutions e.g., INFINITECH, Library 
of AI/ML Algorithms 


Innovative Blockchain Systems (e.g., Graphs mining, Federated 
Learning + Blockchain Personal Data Market) 


Figure 6.4. INFINITECH industrial impact. 


INFINITECH-Reference Architect. These testbeds provide the means for rapid 
product development in the light of changing and heterogenous requirements. 
Further, INFINITECH developed mature innovative bigdata solutions such as 
INFINITECH, and library of AI/ML algorithms. The project devised innovative 
blockchain systems like graph mining, federated learning plus blockchain personal 
data market. 

INFINITECH has open-source community impacts as well. It implements 
ERC1155 smart contract for Hyperledger fabric (Golang) (see Figure 6.5). It 
applies multi token standard including fungible token, non-fungible token, art- 
works, and tickets. It also introduces functionalities such as transfer, mint, burn 
token, and batch transfer. 

INFINITECH has scientific impact as well (see Figure 6.6). The project pro- 
duced an open access book. It is downloaded 17,000 times (in a week) an average 
of 1,000 per day. It is viewed almost 11,00 times on various Linkedin posts. It also 
includes additional impacts through scientific publications, conferences and open- 
source results. The scientific Impact also includes the three INFINITECH Series 
open access books. 

Digitization is changing the Financial Services for many years from core banking 
to multichannel banking industry with different types of devices [1]. For several 
years now, the waves of digitization, financial technology (FinTech) and insurance 
technology (InsuranceTech) are rapidly transforming the financial and insurance 
services industry [3]. For instance, this is illustrated by the rapid growth of FinTech 
start-ups. McKinsey tracked more than 2000 FinTech start-ups in 2016 expecting 
even many more undetected [1]. Moreover, Fin Tech investments have grown from 
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1.8 bn USD in 2011 to more than 30 bn USD in 2018 with a CAGR of ~ 50% 
paa. [2]. [D2.3 snd D2.4] 

FinTech is assigned to different views. They are as well attackers as enhancements 
of incumbents. Furthermore, FinTechs are part of large ecosystems, e.g. within the 
Alibaba platform, and selling infrastructure, e.g. used for open banking [2]. In chal- 
lenging times or the end of a market cycle that is moving to a downward potential, 
new measures are required to maintain steady growth. Analysts like McKinsey [4] 
suggest several levers for organic growth to be explored by banks. [D2.3 and D2.4] 


(1) risk management based on powerful analytical tools to prepare for a down- 
turn; 

(2) productivity, using modular utilities to materially change cost structures; 
and 

(3) revenue growth through an improved customer experience (CX), bringing 
a larger customer base and/or share of wallet. 


Essential to exploiting these profitability levers are the critical enablers of 
advanced data analytics and talent. AI shows a great promise in this field especially 
with the progress in modelling techniques and methods. 

This will facilitate moving to new data sources as e.g. IoT supplementing tra- 
ditional big data analytics in FinTech [2]. A rapid scaling of advanced analytics 
and AI tools is a key to successful growth from McKinsey’s point of view [4]. For 
instance machine learning models can improve predictive accuracy in identifying 
the riskiest potential customers by up-to 35%. [D2.3 and D2.4] 

According to Juniper Research [5] “Technologies such as machine learning and 
blockchain are having a transformative effect on fintech, fundamentally altering the way 
financial services are delivered and driving fintech platforms to become the ‘new nor- 
mal’. Such technologies will make new use cases mainstream, including smart contracts, 
loan underwriting using AI to analyze non-traditional data sources, and personalized 
insurance policies based on Io T-generated data.” 

The highest impact of technology application in Financial and Insurance Ser- 
vices shall be obtained in the focus areas of INFINITECH [4, 5]: 


* Lending & Financing, i.e. Credit Risks (Pilots 1 & 2). 

e Wealth Management and Risk Assessment (Pilots 3 & 4). 

* Customer Experience and Payment (Pilots 5 & 6). 

* Regulation and Compliance, (Pilots 7 & 10). 

* Usage based Insurance, which includes personalized insurance products 
(Pilots 11 to 14). 

* Banking. 
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Figure 5.3 illustrates the INFINITECH impact in its focus areas. In banking 
area INFINITECH induce banks to offer a better customer experience. In this 
way, incumbents improve their digital presence. Customers have limited appetite to 
switch providers therefore, it is expected that incumbents remain the sole providers 
of current checking accounts for customers and businesses. In the area of payments 
there is a need for quicker and more convenient methods of payments among cus- 
tomers and business. This induces more competition among service providers, mak- 
ing providers looking to make a land grab as soon as possible. It is expected that 
technology firms add payments to their ecosystems by providing convenience and 
have ready-made installed base of customers. 

In the lending and financing area, firms have new and novel sources for assessing 
applicants. Many new entrants target riches or improve the customer experience. 
Suppliers in this area keep expanding the market to include those previously 
excluded from financial services. FinTech suppliers continue to keep ahead of 
incumbents by catering to niches. Insurance companies use good fit, analytics, 
and technologies allied with consumers’ desire for personalized services. Incum- 
bent insurers invest heavily in new product areas coupled with low barriers to entry 
foe new start-ups. 

There are ever increasing numbers of niche areas to serve. Nowadays, insurtech 
business models become the new normal as incumbents able to replicate business 
models of insurtechs. Wealth management is another area in which INFINITECH 
has major impact. It appeals to millennials who are looking for new ways to look 
after their investment. It is a crowded market as consumer-oriented banks enter the 
market. 

The market in its current state is considered for rich that is the reason why sup- 
plier should consider broadening the market by income. Trust remains the most 
important issue in this market. As traditional banks begin to invest in this market, 
the standalone firms feel the pressure in their everyday business. Moreover, Regu- 
lation and Compliance, e.g. Financial Crime, Money Laundering, Fraud, includes 
strong opportunities to disrupt as Figure 6.7 shows. This area is covered by pilots 
7 to 10. 

As a side effect of the digital transformation in Financial Services, the trend 
towards persistent digital identities is accelerating. Indeed, “this is due to multiple 
points of failure in conventional identification and verification processes, particularly for 
online payment details but also in a variety of other sectors. Passwords and centralized 
repositories have both been highlighted as the core issue within the growing problem of 
identity fraud, and a variety of approaches have arisen to combat this.” [6] 


Impact on Fintech and Insurance 


INFINITECH Way Foundations 


“SOY 
Hp Moaq inj (uoo pul eG eee 
PREG pagis cjua Add dikdi uou 


Ang 
per (ay | aL 92 ejut penned 


SSH | appui 
Sédeairc dni dji ord) os age mem 
Fai ia ln] UO DH POL RUE AH DR [TURA 


band ii fuaa feq ay 
je praya des cs enuques nadana seu 


ANAGR RAI PPL PLL April 
PREE soveneriaa DUE Aa RURMPÁEQQN 
JU) cq UGUed pps qax Baya 


Re MGR aer a 
qunassa yes spem Sua 545 0] pis ps, my 


uoo ae 
Man poda md SGN p UP UU wl 
Oy aic i pa sud èg jki pa aka 


eor c wee T buyer eg CU 


Mi AH Sr JUD pH ie oro bd abiy 
apma ep pee ss Bupudum desy nimdrns 


‘QTE sg uoc SU SD 


mua ug bup 
joes e quii pus g asus ai wapad py 


"apina 
yay 550] RLU GUC eee pap 


«i retirer 


‘queaqide Buenas is 


Bugaluw|&pabuke sau Kum | Ii ftant cu pue nnm 


CIEL Y 


The Process of Adopting INFINITECH Way Foundations 93 


6.2 The Process of Adopting INFINITECH Way 
Foundations 


The INFINITECH Way Foundations are adopted by 16 pilots in INFINITECH 
project which are listed below. 


1. Configurable and Personalized Insurance Products 


e Pilot #13: Configurable and Personalized Insurance Products for SMEs 
e Pilot #14: Big Data and IoT for the Agricultural Insurance Industry 


2. Personalized Retail and Investment Banking Services 


e Pilot #3: Collaborative Customer-centric Data Analytics for Financial 
Services 

e Pilot #4: Personalized Portfolio Management — Mechanism for AI based 
Portfolio Construction 

e Pilot #5A: Smart and Personalized Pocket Assistant for Personal Financial 
Management 

e Pilot #5B: Business Financial Management (BFM) tools delivering a 
Smart Business Advise 

e Pilot #6: Personalized and Intelligent Investment Portfolio Management 
for Retail Customer 

e Pilot #15: Open Inter-banking Pilot3. Personalized Usage-Based Insur- 
ance Pilots 


3. Personalized Usage Based Insurance Products 


e Pilot #11: Personalized insurance products based on IoT connected vehi- 
cles. 
e Pilot #12: Real World Data for novel Insurance products 


Á. Predictive Financial Crime and Fraud Detection Pilots 


e Pilot #7: Avoiding Financial Crime 

e Pilot #8: Platform for Anti Money Laundering Supervision (PAMLS) 

e Pilot #9: Analysing Blockchain Transaction Graphs for Fraudulent Activ- 
ities 

e Pilot #10: Real-time cybersecurity analytics on financial transactions’ data 

e Pilot #16: Data Analytics Platform to detect payments anomalies linked 
to money laundering events 
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5. Smart, Reliable and Accurate Risk and Scoring Assessment Pilots 


e Pilot #1: Invoices Processing Platform for a more Sustainable Banking 
Industry 
e Pilot #2: Real time risk assessment in Investment Banking 
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This book provides an easy way to understand the design principles and the overall 
process of the INFINITECH project, it can be considered as a handbook of the 
INFINITECH Way. Whit ever-increasing changes in the landscape of finance and 
insurance sector, technologies towards more human-centric applications are being 
developed. With the advancement of technologies, fin-tech and Insurance-tech 
business models are being re-defined. We explain the INFINITECH Way in simple 
words, enhancing the state of the art by introducing the design principles and basic 
metrics. We demonstrate the practicality and applications of INFINITECH Way 
in multiple areas. 

In this book it is described and explained the advantages of INFINITECH 
Way in the on-boarding process which provide speed and agility in optimizing 
the design and implementation of financial services reference architecture. The 
INFINITECH Way mitigates the problems arising from dealing with complex 
vendor-lock and/or proprietary infrastructure. Also, in addressing Big Data, IoT 
and AI applications for the financial and insurance sectors, we expand on how 
INFINITECH Way is applied to validate a reference implementation. 

In this book it is described the core elements of INFINITECH Way, the 
INFINITECH Reference Architecture (RA). A reference architecture for BigData 
systems in digital finance can greatly facilitate stakeholders in structuring, design- 
ing, developing, deploying and operating BigData, AI and IoT solutions. It serves 
as a stakeholders’ communication device, while at the same time providing a range 
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of best practices that can accelerate the development and deployment of effective 
systems. This book has introduced the first version of such a reference architecture, 
namely the INFINITECH-RA. The latter adopt the concept and principles of data 
pipelines, which are in-line with the state of the art in BigData and Artificial Intel- 
ligence systems. It is also in-line with the principles of the reference architecture of 
the BDVA. 

In practice, INFINITECH-RA extends and customizes the BDVA RA with con- 
structs that permit its use for digital finance use cases. The INFINITECH-RA 
defines the structuring principles that drive the integration of the INFINITECH 
technical components and technologies. 

Furthermore in this book it is explained how INFINITECH-RA can be used 
to support the development of common BigData/AI pipelines for digital finance 
applications. In this direction, the book has provided some examples of popular 
pipelines. This book also provides an initial mapping of most pilots of the project to 
the INFINITECH-RA, using INFINITECH components and technologies. The 
presented RA will serve as a basis for the implementation ofthe initial versions of the 
MVP (Minimum Viable Product) of the pilots. These implementations will allow 
the collection of stakeholders' feedback regarding the RA. Based on this feedback, 
the next versions/released of the INFINITECH-RA will be produced, including 
updates in the different views of the architecture. Furthermore, future releases of 
the INFINITECH-RA will consider a broader range of BigData, Io T and AI related 
capabilities developed in INFINITECH. 

In this first book series it is described the methodology applied for 
INFINITECH-RA which is based on review and analysis, architecture design, and 
fine tuning and updates. Further, we discuss general AI/BigData challenges such as 
different type of data sources, accessing heterogeneous data sources, and running 
analytics over stale data, analyzing real-time data to respond to events and create 
alerts and notifications as they events occur, and the existence of a lock between 
operational and analytical operations. Besides, we address specific challenges for 
financial sector. siloed data and business operations, real time performance require- 
ments, mobility, multiple channels management, automation, transparency are 
among many challenges that INFINITECH Way is tackling. 

The summary of the INFINITECH data pack which is the set of files, schemas, 
and metadata model diagrams (Graphs) that represent the way the INFINITECH 
data is organized and structured is described in details and the references to the 
relevant ontologies available in the market are included. Furthermore, all rele- 
vant taxonomies and vocabularies from INFINITECH Core, Financial Industry 
Business Ontology (FIBO), Financial Instrument Global Identifier (FIGI) and the 
Legal Knowledge Interchange Format (LKIF) domains used in INFINITECH are 
delineated and cited for more reference. Moreover, an example of Traffic Analysis 
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Hub Ontology (TAHO) to further the case is provided. Finally a detailed list of 
INFINITECH technologies, data, and processes is included. 

This book details The INFINITECH Readiness Level (IRL) which is a self- 
assessment method created to evaluate and provide guidance during the adoption 
process of generic or specific technology assets and/or technology conditions related 
to the deployment phase of a pilot project. IRLs main objective is to act as a tool for 
project pilot leaders to help them and guide them in their process to identify and 
select ready-to-go technologies, in particular IRL also is useful in the on-boarding 
process of the INFINITECH Way Foundations where technologies that have been 
developed in advance can be adopted at any stage duration the execution of the 
project pilots. 

Furthermore, the INFINITECH Way Foundations impact on FinTech and 
Insurance are described. It is of singular importance to know that the financial 
and insurance sector have not yet an adopted/accepted unified way of accessing 
& querying vast amounts of structured, unstructured, and semi-structured data. 
In this way INFINITECH Way provide a practical solution to real-world prob- 
lems. The impact of INFINITECH Way on several categories including founda- 
tion, experimentation, exploitation, and framework is sumarized. 

Finally, the process of adopting INFINITECH Way Foundations is described. 
Currently the INFINITECH Way Foundations are adopted by 16 pilots in 
INFINITECH project which are listed below under five categories. Each pilot is a 
showcase of how NFINITECH Way could be adopted by practical projects in the 
field. 
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